WO2023095150A1

WO2023095150A1 - First node, second node, communications system and methods performed thereby for handling predictive models

Info

Publication number: WO2023095150A1
Application number: PCT/IN2021/051091
Authority: WO
Inventors: Sunil Kumar Vuppala; Manguluri BHASKAR; Ashutosh Bisht; Deepankar MAHAPATRO; Sai HAREESH
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2023-06-01

Abstract

A computer-implemented method, performed by a first node (111) in a communications system (100). The first node (111) obtains (201) predictive models of a plurality of events, and a respective set of characteristics of each model. The first node (111) determines (202), using machine-learning and the respective set of characteristics: i) one or more first predictive models of a first event, of the plurality of events, to be used as a source model in transfer learning to predict a second event, and ii) a respective expected benefit of each of the determined one or more first predictive models in predicting the second event. The first node (111) then initiates (204) providing a recommendation, to a second node (112), of whether or not to use transfer learning to predict the second event. The recommendation is based on the determined one or more first predictive models and the respective expected benefit.

Description

FIRST NODE, SECOND NODE, COMMUNICATIONS SYSTEM AND METHODS

PERFORMED THEREBY FOR HANDLING PREDICTIVE MODELS

TECHNICAL FIELD

The present disclosure relates generally to a first node and methods performed thereby for handling predictive models. The present disclosure also relates generally to a second node, and methods performed thereby, for handling the predictive models. The present disclosure also relates generally to a communications system, and methods performed thereby, for handling the predictive models. The present disclosure further relates generally to computer programs and computer-readable storage mediums, having stored thereon the computer programs to carry out these methods.

BACKGROUND

Computer systems in a communications network may comprise one or more network nodes. A node may comprise one or more processors which, together with computer program code may perform different functions and actions, a memory, a receiving port and a sending port. A node may be, for example, a server. Nodes may perform their functions entirely on the cloud.

The communications network may cover a geographical area which may be divided into cell areas, each cell area being served by another type of node, a network node in the Radio Access Network (RAN), radio network node or Transmission Point (TP), for example, an access node such as a Base Station (BS), e.g. a Radio Base Station (RBS), which sometimes may be referred to as e.g., evolved Node B (“eNB”), “eNodeB”, “NodeB”, “B node”, or Base Transceiver Station (BTS), depending on the technology and terminology used. The base stations may be of different classes such as e.g., Wide Area Base Stations, Medium Range Base Stations, Local Area Base Stations and Home Base Stations, based on transmission power and thereby also cell size. A cell is the geographical area where radio coverage is provided by the base station at a base station site. One base station, situated on the base station site, may serve one or several cells. Further, each base station may support one or several communication technologies. The telecommunications network may also comprise network nodes which may serve receiving nodes, such as user equipments, with serving beams.

User Equipments (UEs) within the communications network may be e.g., wireless devices, stations (STAs), mobile terminals, wireless terminals, terminals, and/or Mobile Stations (MS). UEs may be understood to be enabled to communicate wirelessly in a cellular communications network or wireless communication network, sometimes also referred to as a cellular radio system, cellular system, or cellular network. The communication may be performed e.g., between two UEs, between a wireless device and a regular telephone and/or between a wireless device and a server via a Radio Access Network (RAN) and possibly one or more core networks, comprised within the wireless communications network. UEs may further be referred to as mobile telephones, cellular telephones, laptops, or tablets with wireless capability, just to mention some further examples. The UEs in the present context may be, for example, portable, pocket- storable, hand-held, computer-comprised, or vehicle-mounted mobile devices, enabled to communicate voice and/or data, via the RAN, with another entity, such as another terminal or a server.

In 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE), base stations, which may be referred to as eNodeBs or even eNBs, may be directly connected to one or more core networks. In the context of this disclosure, the expression Downlink (DL) may be used for the transmission path from the base station to the user equipment. The expression Uplink (UL) may be used for the transmission path in the opposite direction i.e., from the wireless device to the base station.

The standardization organization 3GPP is currently in the process of specifying a New Radio Interface called NR or 5G-UTRA, as well as a Fifth Generation (5G) Packet Core Network, which may be referred to as Next Generation (NG) Core Network, abbreviated as NG-CN, NGC or 5G CN.

The digitalization of communications systems has enabled the eruption of machine learning techniques in communications. Machine learning may be understood as analytical procedures wherein a computer may analyze and autonomously learn patterns from data to build mathematical models of different events that may be used to the occurrence of different events.

Within machine learning, transfer learning may be understood to refer to the reuse of a pre-trained model to predict one event to the prediction of another event. The benefit of transfer learning may be understood to be the saving of time and resources to train a predictive model, by re-using an old model, so that the new predictive model for a new event may be generated in a shorter amount of time. Data and models are also expected to drift, for example, due to, e.g., a change in distribution of input data, addition of new network cells or nodes, a change in traffic pattern, and that may make all previous data obsolete an any already built models inaccurate. Transfer learning may be understood to help in quickly coming up with solution in such changing scenario and this may be dependent on a correct source selection.

In order to leverage transfer learning, a source model may need to be selected. Choosing the right source model for transfer learning may be understood to be important since when the source model selection is incorrect, no benefits may be obtained with transfer learning. Such problem of correct selection source model for transfer learning exists in the RAN context as well.

Transferability and source selection is currently an active area of research. Transfer learning has been successful particularly in the fields of computer vision and in text.

Application of transfer learning to the analysis of timeseries is still in its infancy. Current works of transferability are either empirical, that is, they may be based on a brute force approach, as e.g., opposed to choosing a source model through theoretical logic or established thumb rules with theoretical backings. In other existing methods, transferability may be performed by examining a human judged similarity, or a statistical estimate of similarity, of a dataset to be analyzed with an older dataset for which a predictive model may have already been built. However, metrics that have been proposed to select a source model in fields other than telecommunications, e.g., computer vision, and which may take advantage of large scale available data, are not in sync with results empirically observed in experiments in with telecommunications data, and are therefore not suitable in the telecommunications field. Some of the existing approaches to select source model may, for example, use an ImageNet dataset for classification problems. Others may assume labels, that is, categories, as random and may use an H-score, which may be understood to be suitable for classification problems, for source selection. Yet other methods may make assumptions on the data that may not be adjusted to the actual properties of the data.

As transferability is new, many approaches are being explored using an information theory approach and in computer vision. It may take time for research to yield methods which may be practically flawless and universal. In the meantime, existing methods for selecting a source model for transfer learning may not yield good results with telecommunications data, resulting in higher overhead and higher latency that may eliminate any potential benefits that would otherwise be obtained from transfer learning.

SUMMARY

As part of the development of embodiments herein, one or more challenges with the existing technology will first be identified and discussed. In a communications system, predictive models may be built to predict different aspects of the performance or behavior of a cell. In other words, forecasting may be performed at cell level. If transfer learning is used, a source model may be equivalent to a model at a source cell. Considering that there may be, e.g., 30,000 cells in an area, e.g., a city, selection of source model becomes particularly relevant, as empirical approaches may annul any potential benefits that may otherwise be obtained from transfer learning.

An ultimate goal of embodiments herein may be understood to be to efficiently build machine learning (ML) models to predict different events of interest. To build such ML models efficiently, transfer learning may be used. In transfer learning, an existing model, referred to as a source model, may be used as a model from which knowledge may be borrowed as a head start to build a predictive model for another event for which no predictive model may be yet available. While transfer learning may not be the direct solution, it may be understood as an effective way of using less data and compute to build the new model, since the new model built with transfer learning may provide similar or better performance than if the new model were to be built from scratch. If transfer learning may be used, the advantages may be understood to be faster convergence, less data requirement, higher energy efficiency and better generalization.

Transfer learning may provide an advantage in cases where data for the event of interest that may need to be predicted, which may be referred to as target data, may be limited and so may be compute power. However, transfer learning may not always be useful. In some cases, training a model from scratch may be seen to have better performance and convergence, after some epochs, than a model built using transfer learning. The choice of source model may be understood to impact the advantages from transfer learning. Embodiments herein may be understood to be drawn, in general, to source model selection when using transfer learning.

According to the foregoing, it is an object of embodiments herein to improve the handling of predictive models. More particularly, it is an object of embodiments herein to improve the handling of predictive models, in a communications system.

According to a first aspect of embodiments herein, the object is achieved by a computer- implemented method, performed by a first node. The method is for handling predictive models. The first node operates in a communications system. The first node obtains a plurality of predictive models of a plurality of events, and a respective set of characteristics of each predictive model in the plurality. The first node determines, using machine-learning and the respective set of characteristics, out of the plurality, one or more first predictive models of a first event, of the plurality of events. The one or more first predictive models of the first event are to be used as a source model in transfer learning to predict a second event. The first node also determines, using machine-learning and the respective set of characteristics, and out of the plurality, a respective expected benefit of each of the determined one or more first predictive models in predicting the second event. The first node then initiates providing a recommendation, to a second node operating in the communications system. The recommendation is of whether or not to use transfer learning to predict the second event. The recommendation is based on the determined one or more first predictive models and the respective expected benefit.

According to a second aspect of embodiments herein, the object is achieved by a computer-implemented method, performed by the second node. The method is for handling the predictive models. The second node operates in the communications system. The second node receives the recommendation from the first node operating in the communications system. The recommendation is of whether or not to use transfer learning to predict the second event. The recommendation is based on, out of the plurality of predictive models of the plurality of events: the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event. The one or more first predictive models of the first event have been determined using machine-learning and based on the respective set of characteristics of each predictive model in the plurality. The recommendation is also based on, out of the plurality of predictive models of the plurality of events, the respective expected benefit of each of the determined one or more first predictive models in predicting the second event. The second node then initiates predicting the second event based on the received recommendation.

According to a third aspect of embodiments herein, the object is achieved by a computer- implemented method, performed by the communications system. The method is for handling the predictive models. The communications system comprises the first node and the second node. The method comprises obtaining, by the first node, the plurality of predictive models of the plurality of events, and the respective set of characteristics of each predictive model in the plurality. The method also comprises determining, by the first node, using machine-learning and the respective set of characteristics, out of the plurality, the one or more first predictive models of the first event, of the plurality of events. The one or more first predictive models of the first event are to be used as the source model in transfer learning to predict the second event. The method also comprises determining, by the first node, using machine-learning and the respective set of characteristics, and out of the plurality, the respective expected benefit of each of the determined one or more first predictive models in predicting the second event. The method further comprises initiating, by the first node, providing the recommendation, to the second node operating in the communications system. The recommendation is of whether or not to use transfer learning to predict the second event. The recommendation is based on the determined one or more first predictive models and the respective expected benefit. The method additionally comprises receiving, by the second node, the recommendation. The method further comprises initiating, by the second node, predicting the second event based on the received recommendation.

According to a fourth aspect of embodiments herein, the object is achieved by the first node, for handling the predictive models. The first node is configured to operate in the communications system. The first node is further configured to obtain the plurality of predictive models of the plurality of events, and the respective set of characteristics of each predictive model in the plurality. The first node is also configured to determine, using machine-learning and the respective set of characteristics, out of the plurality: i) the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event, and ii) the respective expected benefit of each of the one or more first predictive models configured to be determined, in predicting the second event. The first node is further configured to initiate providing the recommendation, to the second node configured to operate in the communications system. The recommendation is of whether or not to use transfer learning to predict the second event. The recommendation is configured to be based on the one or more first predictive models and the respective expected benefit configured to be determined.

According to a fifth aspect of embodiments herein, the object is achieved by the second node, for handling the predictive models. The second node is configured to operate in the communications system. The second node is further configured to receive, from the first node configured to operate in the communications system, the recommendation of whether or not to use transfer learning to predict the second event. The recommendation is configured to be based on, out of the plurality of predictive models of the plurality of events, the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event. The one or more first predictive models of the first event, of the plurality of events, to be used as the source model are configured to be determined using machine-learning and based on the respective set of characteristics of each predictive model in the plurality. The recommendation is also configured to be based on, out of the plurality of predictive models of the plurality of events, the respective expected benefit of each of the one or more first predictive models configured to be determined, in predicting the second event. The second node is also configured to initiate predicting the second event based on the recommendation configured to be received.

According to a sixth aspect of embodiments herein, the object is achieved by the communications system, for handling the predictive models. The communications system comprises the first node and the second node. The communications system is further configured to obtain, by the first node, the plurality of predictive models of the plurality of events, and the respective set of characteristics of each predictive model in the plurality. The communications system is also configured to determine, by the first node, , using machinelearning and the respective set of characteristics, out of the plurality: i) the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event, and ii) the respective expected benefit of each of the one or more first predictive models configured to be determined, in predicting the second event. The communications system is further configured to initiate, by the first node, providing the recommendation, to the second node configured to operate in the communications system. The recommendation is of whether or not to use transfer learning to predict the second event. The recommendation is configured to be based on the one or more first predictive models and the respective expected benefit configured to be determined. The communications system is additionally configured to receive, by the second node, from the first node, the recommendation. The communications system is also configured to initiate, by the second node, performing the predicting the second event based on the recommendation configured to be received.

According to a seventh aspect of embodiments herein, the object is achieved by a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the first node.

According to an eighth aspect of embodiments herein, the object is achieved by a computer-readable storage medium, having stored thereon the computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the first node. According to a ninth aspect of embodiments herein, the object is achieved by a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the second node.

According to a tenth aspect of embodiments herein, the object is achieved by a computer- readable storage medium, having stored thereon the computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the second node.

According to an eleventh aspect of embodiments herein, the object is achieved by a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the communications system.

According to a twelfth aspect of embodiments herein, the object is achieved by a computer-readable storage medium, having stored thereon the computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the communications system.

By obtaining the plurality of predictive models of a plurality of events and the respective set of characteristics of each predictive model in the plurality, the first node may be enabled to then train the machine-learning model, to select the predictive model that may be the most optimal source model to be used to predict another event of interest.

By determining the one or more first predictive models of the first event to be used as the source model in transfer learning to predict the second event, as well as their respective expected benefit, the first node may enable to use a machine-learning based approach for source selection, which may be trainable/learnable, as opposed to static/handcrafted rule-based traditional methods. This may in turn enable to achieve better generalizability, and easier life cycle management of ML models, since as data may change, the required number of days for updating a model may be minimized with transfer learning. Moreover, less data storage requirements may be enabled, as less data and models may need to be generated and stored, which further may reduce the energy consumed and compute time.

In the case where there may be less data for the second event, that it would otherwise be required to train a new model from scratch, transfer learning may be understood to be the most successful approach in literature as of today. For example, there may be difficulties in obtaining more data and using it for modelling, e.g., in a target cell, which may even delay the time it may take to achieve a target predictive model for the second event. It may be understood that if transfer learning is used, these requirements may be minimized. However, in order to use transfer learning effectively, and achieve its benefits, good selection of a source model may be understood to be needed for recommendation.

By initiating providing the recommendation, the first node may enable to recommend whether transfer leaning may add value to a specific case, as opposed to building a model from scratch, and may indicate if it may be helpful to apply transfer learning only for the relevant cases.

By the second node receiving the recommendation, and the second node then initiating predicting the second event based on the recommendation, the first node may therefore be understood to enable the second node to achieve the advantages in time and compute for many events, e.g., wherein transfer learning may be beneficial, and to refrain from applying transfer learning when it may provide no benefits. Hence, accuracy in predicting the second event may be optimized, while optimizing use of time and compute resources.

BRIEF DESCRIPTION OF THE DRAWINGS

Examples of embodiments herein are described in more detail with reference to the accompanying drawings, according to the following description.

Figure 1 is a schematic diagram illustrating a non-limiting example of a communications system, according to embodiments herein.

Figure 2 is a flowchart depicting embodiments of a method in a first node, according to embodiments herein.

Figure 3 is a flowchart depicting embodiments of a method in a second node, according to embodiments herein.

Figure 4 is a flowchart depicting embodiments of a method in a communications system, according to embodiments herein.

Figure 5 is a schematic diagram depicting a non-limiting example of a method according to embodiments herein.

Figure 6 is a schematic diagram depicting a non-limiting example of a method according to embodiments herein.

Figure 7 is a schematic diagram depicting a non-limiting example of aspects related to a method according to embodiments herein.

Figure 8 is a schematic diagram depicting a non-limiting example of aspects related to a method according to embodiments herein. Figure 9 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a first node, according to embodiments herein.

Figure 10 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a second node, according to embodiments herein.

Figure 11 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a communications system, according to embodiments herein.

DETAILED DESCRIPTION

Certain aspects of the present disclosure and their embodiments address the challenges identified in the Background and Summary sections with the existing methods and provide solutions to the challenges discussed.

As a summarized overview, embodiments herein may be understood to relate to a method, and system for source selection in transfer learning. Particular embodiments herein may relate to a method, and system for source selection in transfer learning for KPI predictions.

Embodiments herein may provide a more generic approach than the existing methods, which may be a learnable system which may improve in performance as more data may be used to train it. Particularly, embodiments herein may be based on a recommender system technique for source model selection for transfer learning. Further particularly, the transfer learning may be applied for time series problems such as forecasting and predictions in a RAN domain of a communications system.

The embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which examples are shown. In this section, embodiments herein are illustrated by exemplary embodiments. It should be noted that these embodiments are not mutually exclusive. Components from one embodiment or example may be tacitly assumed to be present in another embodiment or example and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. All possible combinations are not described to simplify the description.

Figure 1 depicts two non-limiting examples, in panels “a” and “b”, respectively, of a communications system 100, in which embodiments herein may be implemented. In some example implementations, such as that depicted in the non-limiting example of Figure la, the communications system 100 may be a computer network. In other example implementations, such as that depicted in the non-limiting example of Figure lb, the communications system 100 may be implemented in a telecommunications system, sometimes also referred to as a telecommunications network, cellular radio system, cellular network or wireless communications system. In some examples, the telecommunications system may comprise network nodes which may serve receiving nodes, such as wireless devices, with serving beams.

In some examples, the telecommunications system may for example be a network such as 5G system, or a newer system supporting similar functionality, or a Long-Term Evolution (LTE) network, e.g. LTE Frequency Division Duplex (FDD), LTE Time Division Duplex (TDD), LTE Half-Duplex Frequency Division Duplex (HD-FDD), or LTE operating in an unlicensed band. The telecommunications system may also support other technologies, such as, for example, Wideband Code Division Multiple Access (WCDMA), Universal Terrestrial Radio Access (UTRA) TDD, Global System for Mobile communications (GSM) network, GSM/Enhanced Data Rate for GSM Evolution (EDGE) Radio Access Network (GERAN) network, Ultra-Mobile Broadband (UMB), EDGE network, network comprising of any combination of Radio Access Technologies (RATs) such as e.g. Multi-Standard Radio (MSR) base stations, multi-RAT base stations etc., any 3rd Generation Partnership Project (3GPP) cellular network, Wireless Local Area Network/s (WLAN) or WiFi network/s, Worldwide Interoperability for Microwave Access (WiMax), IEEE 802.15.4-based low-power short-range networks such as IPv6 over Low-Power Wireless Personal Area Networks (6LowPAN), Zigbee, Z-Wave, Bluetooth Low Energy (BLE), or any cellular network or system. The telecommunications system may for example support a Low Power Wide Area Network (LPWAN). LPWAN technologies may comprise Long Range physical layer protocol (LoRa), Haystack, SigFox, LTE-M, and Narrow-Band loT (NB-IoT).

The communications system 100 may comprise a plurality of nodes, whereof a first node 111, and a second node 112 are depicted in Figure 1. Any of the first node 111 and the second node 112 may be understood, respectively, as a first computer system and a second computer system. In some examples, any of the first node 111 and the second node 112 may be implemented as a standalone server in e.g., a host computer in the cloud 120, as depicted in the non-limiting example depicted in panel b) of Figure 1. Any of the first node 111 and the second node 112 may in some examples be a distributed node or distributed server, with some of their respective functions being implemented locally, e.g., by a client manager, and some of its functions implemented in the cloud 120, by e.g., a server manager. Yet in other examples, any of the first node 111 and the second node 112 may also be implemented as processing resources in a server farm.

In some embodiments, any of the first node 111 and the second node 112 may be independent and separated nodes. In some embodiments, the first node 111 and the second node 112 may be one of: co-localized and the same node. All the possible combinations are not depicted in Figure 1 to simplify the Figure.

It may be understood that the communications system 100 may comprise more nodes than those represented on panel a) of Figure 1.

In some examples of embodiments herein, the first node 111 may be understood as a node having a capability to train a predictive model using machine learning in the communications system 100. A non-limiting example of the first node 111 may be a server. In embodiments wherein the communications system 100 may be a 5G network, the first node

111 may be e.g., a Network Data Analytics Function (NWDAF).

The second node 112 may be a node having a capability to train a machine learning predictive model. In particular examples, the second node 112 may be another server. In embodiments wherein the communications system 100 may be a 5G network, the second node

112 may be e.g., another Network Data Analytics Function (NWDAF).

The communications system 100 may comprise one or more radio network nodes, whereof a radio network node 130 is depicted in Figure 1. The radio network node 130 may typically be a base station or Transmission Point (TP), or any other network unit capable to serve a device or a machine type node in the communications system 100. The radio network node 130 may be e.g., a 5G gNB, a 4G eNB, or a radio network node in an alternative 5G radio access technology, e.g., fixed or WiFi. The radio network node 130 may be e.g., a Wide Area Base Station, Medium Range Base Station, Local Area Base Station and Home Base Station, based on transmission power and thereby also coverage size. The radio network node 130 may be a stationary relay node or a mobile relay node. The radio network node 130 may support one or several communication technologies, and its name may depend on the technology and terminology used. The radio network node 130 may be directly connected to one or more networks and/or one or more core networks.

The communications system 100 may cover a geographical area, which in some embodiments may be divided into cell areas, wherein each cell area may be served by a radio network node, although, one radio network node may serve one or several cells. The network node 130 may be of different classes, such as, e.g., macro eNodeB, home eNodeB or pico base station, based on transmission power and thereby also cell size. In some examples, the network node 130 may serve receiving nodes with serving beams. The radio network node may support one or several communication technologies, and its name may depend on the technology and terminology used. Any of the radio network nodes that may be comprised in the communications network 100 may be directly connected to one or more core networks.

The communications system 100 may comprise a plurality of devices whereof a device 140 is depicted in Figure 1. The device 140 may be also known as e.g., user equipment (UE), a wireless device, mobile terminal, wireless terminal and/or mobile station, mobile telephone, cellular telephone, laptop with wireless capability, a Customer Premises Equipment (CPE), a thing in an internet of things network, or a sensor, just to mention some further examples. The device 140 in the present context may be, for example, portable, pocket- storable, hand-held, computer-comprised, or a vehicle-mounted mobile device, enabled to communicate voice and/or data, via a RAN, with another entity, such as a server, a laptop, a Personal Digital Assistant (PDA), or a tablet computer, sometimes referred to as a tablet with wireless capability, or simply tablet, a Machine-to-Machine (M2M) device, a device equipped with a wireless interface, such as a printer or a file storage device, modem, Laptop Embedded Equipped (LEE), Laptop Mounted Equipment (LME), USB dongles, CPE or any other radio network unit capable of communicating over a radio link in the communications system 100. The device 140 may be wireless, i.e., it may be enabled to communicate wirelessly in the communications system 100 and, in some particular examples, may be able support beamforming transmission. The communication may be performed e.g., between two devices, between a device and a radio network node, and/or between a device and a server. The communication may be performed e.g., via a RAN and possibly one or more core networks, comprised, respectively, within the communications system 100.

The first node 111 may communicate with the second node 112 over a first link 151, e.g., a radio link or a wired link. The first node 111 may communicate with the radio network node 130 over a second link 152, e.g., a radio link or a wired link. The radio network node 130 may communicate, directly or indirectly, with the device 140 over a third link 153, e.g., a radio link or a wired link. Any of the first link 151, the second link 152 and/or the third link 153 may be a direct link or it may go via one or more computer systems or one or more core networks in the communications system 100, or it may go via an optional intermediate network. The intermediate network may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network, if any, may be a backbone network or the Internet, which is not shown in Figure 1. In general, the usage of “first”, “second”, and/or “third” herein may be understood to be an arbitrary way to denote different elements or entities, and may be understood to not confer a cumulative or chronological character to the nouns these adjectives modify.

Although terminology from Long Term Evolution (LTE)/5G has been used in this disclosure to exemplify the embodiments herein, this should not be seen as limiting the scope of the embodiments herein to only the aforementioned system. Other wireless systems support similar or equivalent functionality may also benefit from exploiting the ideas covered within this disclosure. In future telecommunication networks, e.g., in the sixth generation (6G), the terms used herein may need to be reinterpreted in view of possible terminology changes in future technologies.

Embodiments of a computer-implemented method, performed by the first node 111, will now be described with reference to the flowchart depicted in Figure 2. The method may be understood to be for handling predictive models. The first node 111 operates in the communications system 100.

In some embodiments, the wireless communications network 100 may support at least one of: New Radio (NR), Long Term Evolution (LTE), LTE for Machines (LTE-M), enhanced Machine Type Communication (eMTC), and Narrow Band Internet of Things (NB-IoT).

The method may comprise the actions described below. In some embodiments, all the actions may be performed. In other embodiments, some of the actions may be performed. One or more embodiments may be combined, where applicable. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. All possible combinations are not described to simplify the description. A nonlimiting example of the method performed by the first node 111 is depicted in Figure 2. In Figure 2, optional actions in some embodiments may be represented with dashed lines.

Action 201

As stated earlier, embodiments herein may be understood to be drawn, in general, to source model selection when using transfer learning, so that a predictive model for an event may be built more efficiently, that is in less time and/or with less resources. Particularly, embodiments herein may be understood to be drawn to an assessment of whether transfer learning with a selected source model may be more efficient or not than building a new model from scratch, when trying to predict an event for which no predictive model may have already been built. To achieve this goal, embodiments herein may be understood to be drawn, in general, to building, and then using, a machine-learning model, referred to herein as a recommender system, for source model selection when using transfer learning, and for assessment of whether or not transfer learning may be beneficial. Embodiments herein may therefore comprise, as will be explained in the next action, training a machine-learning model to select one or more first predictive models of a first event to predict a second event.

In order to ultimately build such a machine-learning model, in this Action 201, the first node 111 first obtains a plurality of predictive models of a plurality of events. That is, the first node 111 may first obtain a catalogue of source models for respectively predicting events other than an event that may be of interest and for which no predictive model may be available.

In this Action 201, the first node 111 also obtains a respective set of characteristics of each predictive model in the plurality. The respective set of characteristics of each model may comprise for example, the respective event that the respective predictive model may predict, e.g., a KPI at a certain cell, the features or factors, that is, the independent variables, that may be comprised in the model, attributes or metadata of the data source(s) used to build the respective model, such as, e.g., in the case of a cell, its configuration, e.g., frequency band, capacity, operational data, e.g., traffic, type of site, e.g., urban, rural, semi urban, indoor, outdoor etc.., Physical Resource Block (PRB) Utilisation, Signal to Interference Noise Ratio (SINR), distribution of the data used for building the respective model, etc. . .

The obtaining in this Action 201 may comprise, retrieving, collecting, generating, measuring or receiving, e.g., from another node, which may be operating in the communications system 100. That is, in some embodiments, in this Action 201, the first node 111 may receive at least some of the predictive models in the plurality of predictive models of the plurality of events from one or more other nodes, which may have either generated them or stored them. In other embodiments, one or more of the predictive models in the plurality of predictive models of the plurality of events may be generated by the first node 111 itself, e.g., using machine-learning, to build the catalogue of source models. The first node 111 may obtain these one or more of the predictive models in the plurality of predictive models of the plurality of events by empirically building different permutations of source and target. In other words, the first node 111 may trained to ‘select’ the most beneficial models from multiple models.

Particularly, the first node 111 in this Action 201, in what may be referred to as a first group of machine-learning procedures of embodiments herein, may obtain the respective events that may be the targets of the respective predictive models, the raw data sets of the respective events and features of the respective models, as well as the type of time series task that may be desired, e.g., forecasting, regression, etc.., and then build a corpus of source models based on task. It may be understood that there may be different ‘types of tasks’ for which different ML models may be used. Time series task may be understood to be those where data points may have time as one of the components, such as forecasting the weather/temperature 2-hours-from-now. These may be understood to be distinct from other tasks such as classification, where they may not be a need for time as a component, for example, classifying an image as cat-or-dog. The ‘type-of-task’ may be used as a categorical input data, which may represent a discrete item from a set of choices. Categorical data may be understood to be different from numeric data, which may be understood to be continuous and to have a notion of order, e.g., bigger, smaller etc.. The first node 111 may generate this corpus of source models by using an existing blueprint of methods, also referred to as architectures in DL, which may vary by varying some variables or hyperparameters in machine learning systems, defined as a part of embodiments herein. The blueprint may be used to train models to build transfer learning scenarios selecting some models as source models and some events as target events. A score of transfer learning may thereby be obtained. The resulting corpus of source models based on task that may be empirically built in this Action 201 may be referred to herein a feature engineered data, and it may then be used to train the machine-learning model in the next Actions.

By obtaining the plurality of predictive models of a plurality of events and the respective set of characteristics of each predictive model in the plurality in this Action 201, the first node 111 may be enabled to then train the machine-learning model in the next Actions, to select the predictive model that may be the most optimal source model to be used to predict another event of interest, as well as to evaluate if transfer learning may be beneficial or not to predict the second event.

Action 202

In this Action 202, the first node 111 determines, using machine-learning and the respective set of characteristics, out of the plurality of predictive models of the plurality of events, one or more first predictive models of a first event, of the plurality of events. The one or more first predictive models are to be used as a source model in transfer learning to predict a second event. The second event may be understood to be the event of interest, for which no predictive model may be available. The source model may be understood to be the model used for leveraging transfer learning to predict the second event. Determining in this Action 202 may be understood as calculating, generating, e.g., by training, or deriving.

As opposed to a static rule-based selection, the determining of the one or more first predictive models of the first event in this Action 202 may be performed with a recommender system. That is, a machine-learning model, that may be, for example, based on Deep Learning (DL) techniques, which may learn to recommend the most appropriate source model for transfer learning from the plurality of predictive models of the plurality of events, that is, from the catalogue obtained in Action 201.

Not every first event may be advantageous to be used as a source model in transfer learning to predict the second event. There may be many multiple factors which may cause this variability, that is, any of the respective set of characteristics. For example, in the case a source model may correspond to a cell the respective set of characteristics may comprise any of its coverage, capacity, load, distribution of KPI’s, geographical factors, congestion, neighbouring cell behaviour, etc.. The determining in this Action 202 is thereby based on the respective set of characteristics.

In this Action 202, the first node 111 also determines, using machine-learning and the respective set of characteristics, out of the plurality, a respective expected benefit of each of the determined one or more first predictive models in predicting the second event. Since the generation of a predictive model with machine-learning techniques may be understood to involve a certain amount of training time, the expected benefit, may be understood to be reduction in training time. The reduction in training time may be based on convergence. Convergence may be understood as an indication that a predictive model may have reached an error range of prediction, which may not be further improved with additional training. Convergence may be understood to be measured using empirical data. The error in prediction may be noted for each epoch, a time unit, during the training. This measurement may be done separately for a model trained from scratch, as well as using transfer learning. A difference in error between two approaches may be computed for each epoch. When the error does not change more than a configurable threshold, it may be said to have reaches convergence. As a part of training, it may be determined how well the model may be generalising on a separate data set. If the error is the same or better than when training a model from scratch, it may be concluded that the model has better convergence. An epoch may be one complete pass of training a dataset for model training. Multiple such passes may be understood to be required for model training. According to the foregoing, the determining in this Action 202 may be based on a score of transfer learning that may be calculated as: score of transfer learning = I (better convergence) * time saved(epochs)

Better convergence may occur when the transfer learning task may have lower error of prediction of the another event than when compared to training a new model from scratch.

In the above formula, I may be understood to be an indicator variable, which may take a value of 0 if better convergence is not observed. Time taken in epochs may be understood to be a measurement of a number of iterations over entire data the DL may take to reach convergence. In the convergence formula provided above, the time saved may be calculated by taking a difference between the time taken when training a predictive model for the event from scratch, with a time taken while training from a source model in units of epochs. This may be understood to be a one-time task during the gathering of input for the recommender system, step-1 in Figure 5, which will be describer later, no additional computation may be required during further steps. In other words, generating the training data may be understood to be expected to be done less often than using inferences from a fully trained model.

The expected benefit for each first predictive model may later be output in this Action 202 with a confidence score. A confidence score may be understood as a measure of certainty of the predictions. This may be understood to be a value bounded between 0-1 and may be interpreted as a probability that the suggestion is correct.

Since the determining of this Action 202 may be performed with a machine-learning mode, Action 202 may comprise a training phase of the machine-learning model, that is, of the recommender system, and an interference phase, wherein the already trained machine-learning model may be used for the second event.

Training phase

According to the foregoing, the determining in this Action 202 of the one or more first predictive models of the first event may comprise training a machine-learning model, that is, the recommender system, to select one or more predictive models of one event to predict another event. In other words, during the training phase, the machine learning model, may have been trained to select source models for a myriad of target events. Once trained, then the machine-learning model may later be used during the inference phase to predict the second event at hand. During the training phase, the recommender system may automatically learn, e.g., using DL techniques, the distribution characteristics of the input data which may give the best results, within a threshold, for the source model in a recommendation.

The selection of the one or more predictive models of one event to predict the another event may be based on the source models from the plurality of predictive models of the plurality of events that may be expected to provide a positive benefit. This may be based on comparing the convergence obtained with the source model in transfer learning to predict the another event, with the convergence that may be achieved if a model to predict the another event were built from scratch.

During the training phase, the machine-learning model may output several recommended source models with their respective scores of transfer learning advantage, e.g., calculated with the formula provided below. This output may be provided to users of the transfer learning, which may them provide feedback to the first node 111 on the recommendations. Accordingly, the training of the machine-learning model may be based on feedback provided on a convergence of the selected one or more predictive models with observed data of the another event. The feedback may be provided, for example, as a deviation between an expected benefit and an actual observed benefit when the recommended model may have been used for transfer learning. The feedback may be understood to enable to make the system up to date. A predictive model may have a lifecycle of training and inference. Once the trained model may be ideally available for inference entirely, practically, data being observed may change with respect to data used for training. If such deviation is observed between the recommendation and what is observed, this may be a trigger for retraining the recommender system.

The training phase may also comprise, during a number of iterations, Action 203, Action 204, Action 205, Action 206, Action 207, and Action 208, which are described below. An overview of the training phase of embodiments herein is depicted in Figure 6, which will be described later.

Inference phase

Once the machine-learning model, that is the recommender system may have been trained, e.g., once it may achieve a certain convergence value between model recommendation and the value available in a training dataset used for the first event, the machine-learning model may be then applied in the inference phase to determine, out of the plurality of predictive models of the plurality of events, the one or more first predictive models of a first event, of the plurality of events to be used as a source model in transfer learning to predict a second event. For the inference phase, the input may be the raw data of the second event, as well as the time series task, e.g., Forecasting, regression. More than one first predictive model of the first event may be chosen as each of the determined one or more first predictive models may then be output with its respective expected benefit of in predicting the second event, e.g., respective score of TL advantage. That is, the output may then be:

Source Model 1 + Score of TL Advantage

Source Model 2 + Score of TL Advantage

The inference phase of the method may also comprise performance of Action 203, which will be described next. An overview of the inference phase of embodiments herein is depicted in Figure 5, which will be described later.

Example

A particular goal of some embodiments herein may be understood to be to efficiently build machine learning (ML) models for RAN KPI forecasting, e.g., on per cell basis. Accordingly, in some particular embodiments, the second event may be a performance of a key performance indicator (KPI) of a radio access network, that is, a RAN of the communications system 100. As a non-limiting example, the KPI may be downlink user throughput (Mbps) at a cell, which may be referred to as a target cell. That is, a target cell may be understood to be the cell where it may be of interest to forecast the KPI. Other non-limiting examples of KPI may be Accessibility(%) and/or Retainability(%). As a non-limiting illustrative example, the second event may be, more particularly, to forecast hourly sampled downlink throughput KPI at a cell level.

In some of such embodiments, the recommender system, to perform the determining of this Action 202, may take the following inputs. A first input may be KPI related data at the target cell for downlink user throughput, accessibility, retainability. A second input may be meta data for the target cell, e.g., taking as example forecasting. The meta data may comprise: i) configuration: an indicator of coverage or capacity cell, e.g., frequency band number such as one of 14, 29, 5, 30, 46, and ii) operational data: historical traffic load on the cell, site type, e.g., urban, rural, semi urban, indoor, outdoor etc.., PRB Utilisation, SINR. Examples of operational data may be daily traffic average for last 7 days and site type. A third input may be task level configurations, such as number of steps into future for forecast, e.g., one of 1,4,10,24, number of days of data available at the target model, e.g., 7,30,60,90, backtracking number of points to be used from history for forecast, rolling window, that is, number of days, for time series from past, e.g., 1,4,10,24, etc... A target model may be understood to be a model trained on a target cell. There may be two options borrowing knowledge from another model trained on other cell, either using one of the source models or training from scratch.

This approach may be understood to be suitable for any downstream task that may explore utility of transfer learning for networks.

Within the context of the embodiments wherein the second event may be a performance of a KPI of a RAN of the communications system 100, a base cell may be understood to be the cell on which a source model and data, e.g., the first predictive model of the first event, may have been trained, at a particular cell. Not every base cell may be advantageous for building target model leveraging transfer learning. There are many multiple factors which may cause this variability in performance of each cell, that is, any of the respective set of characteristics, such as its coverage, capacity, load, distribution of KPI’s, geographical factors, congestion, neighbouring cell behaviour, etc... which may be understood to highlight the importance of selection of the source model.

By determining the one or more first predictive models of the first event to be used as the source model in transfer learning to predict the second event, as well as their respective expected benefit in this Action 202, the first node 111 may enable to use a machine-learning based approach for source selection, which may be trainable/leamable, as opposed to static/handcrafted rule -based traditional methods. This may in turn enable to achieve better generalizability, and easier life cycle management of ML models, since as data may change, the required number of days for updating a model may be minimized with transfer learning. Moreover, less data storage requirements may be enabled, as less data and models may need to be generated and stored, which further may reduce the energy consumed and compute time.

In the case where there may be less data for the second event, that it would otherwise be required to train a new model from scratch, transfer learning may be understood to be the most successful approach in literature as of today. For example, there may be difficulties in obtaining more data and using it for modelling, e.g., in a target cell, which may even delay the time it may take to achieve a target predictive model for the second event. It may be understood that if transfer learning is used, these requirements may be minimized. However, in order to use transfer learning effectively, and achieve its benefits, good selection of a source model may be understood to be needed for recommendation. Action 203

Transfer learning may help in re-using obsolete knowledge based on the selection of source model, but not always. In other words, transfer learning may not always be useful. This may be understood to impact the advantage that may be gained in performance and compute power from transfer learning. It may be that building a new model to predict the second event may be more advantageous, as it may take less time to train an accurate mode, and it may require less resources.

In some embodiments, in this Action 203, the first node 111, e.g., with its recommender system, may determine whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models may be above a threshold. The threshold may be e.g., a configurable score of transfer learning.

Determining in this Action 203 may be understood as calculating, generating, e.g., by training, or deriving.

In such embodiments wherein the first node 111 may determine whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models may be above the threshold, with the proviso that a respective expected benefit exceeds the threshold, the recommendation may be to use the respective determined first predictive model. The best source model may then be recommended to be used for transfer learning.

With the proviso that the respective expected benefit does not exceed the threshold, the recommendation may be to refrain from using transfer learning and to determine a new predictive model for the second event. When the expected benefit may be below the configured threshold, it may be implied that it may be unlikely that transfer learning using any of the source models may provide benefit. In such case, the machine-learning model for the second event, e.g., a forecasting model at a target cell, may be recommended to be trained from scratch, without use of transfer learning.

By determining whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models may be above the threshold in this Action 203, the first node 111 may enable to build effective models. This is since the first node 111 may check whether at least one source model may be suitable to use in transfer learning for the second event. Else the first node 111 may recommend not to use the source model. The first node 111 may therefore be enabled to recommend whether transfer leaning may add value to a specific case, as opposed to building a model from scratch, and may indicate if it may be helpful to apply transfer learning only for the relevant cases . The first node 111 may therefore be understood to enable to achieve the advantages in time and compute for many events, e.g., wherein transfer learning may be beneficial, and to refrain from applying transfer learning when it may provide no benefits. Hence, accuracy in predicting the second event may be optimized, while optimizing use of time and compute resources.

Action 204

In this Action 204, the first node 111 initiates providing a recommendation, to the second node 112 operating in the communications system 100. The recommendation is of whether or not to use transfer learning to predict the second event. The recommendation is based on the determined one or more first predictive models and the respective expected benefit.

Initiating may be understood as starting, triggering, facilitating or enabling another node to provide the recommendation, or providing the recommendation itself, e.g., via the first link 151. Providing may be e.g., sending or transmitting.

In some embodiments, the recommendation may further indicate at least one of: a) each determined one or more first predictive models, and b) the respective expected benefit of each of the determined one or more first predictive models. In other words, the first node 111 may output whether transfer learning may be advantageous, and if so, the best source model with its respective expected benefit, e.g., reduction in training time, with a confidence score.

By initiating providing the recommendation in this Action 204, the first node 111 may enable to recommend whether transfer leaning may add value to a specific case, as opposed to building a model from scratch, and may indicate if it may be helpful to apply transfer learning only for the relevant cases. The first node 111 may therefore be understood to enable to achieve the advantages in time and compute for many events, e.g., wherein transfer learning may be beneficial, and to refrain from applying transfer learning when it may provide no benefits. Hence, accuracy in predicting the second event may be optimized, while optimizing use of time and compute resources.

Action 205

In some embodiments, in this Action 205, the first node 111 may determine, based on first observed data on the second event, an error of prediction of the second event of at least one of the determined one or more first predictive models.

First observed data may be understood to be empirical data generated or obtained by the first node 111 to train the recommender system. The error of prediction may be understood to be a measure of how well the model is performing for the given task, calculated based on model output and the expected output, and may be calculated as, e.g., Mean squared error (MSE), or Mean Absolute Error (MAE). The expected output may be calculated based on empirical data.

Determining in this Action 205 may be understood as calculating or deriving or obtaining or retrieving. In some embodiments, the first node 111 may determine the error by receiving a first indication of the error from the second node 112, as determined by the second node 112.

By the first node 111 determining the error in this Action 205, the first node 111 may be enabled to assess if further training of the determined machine-learning model used for the determining in Action 202 of the one or more first predictive models may be necessary in order to further increase the accuracy of the determined machine-learning model, and thereby improve the accuracy of the recommendation system in providing a recommendation that results in predicting the second event with optimized accuracy, while optimizing use of time and compute resources.

Action 206

In this Action 206, the first node 111 may retrain the machine-learning model used for the determining in Action 202 of the one or more first predictive models, based on the determined error.

By the first node 111 retraining the machine-learning model used for the determining in Action 202 of the one or more first predictive models, based on the determined error in this Action 206, the first node 111 may be enabled to assess if further training of the determined machine-learning model may be necessary in order to further increase the accuracy of the determined machine-learning model, and thereby improve the accuracy of the recommendation system in providing a recommendation that results in predicting the second event with optimized accuracy, while optimizing use of time and compute resources .

Action 207

In embodiments wherein the recommendation may be to use a recommended first predictive model of the first event, in this Action 207, the first node 111 may receive, from the second node 112, information on a convergence of the recommended first predictive model of the first event with second observed data of the second event.

The second observed data of the second event may be understood to be new data, e.g., collected or obtained by the second node 112, on the second event, on which a model may have been build using transfer learning from the recommended first predictive model. The information on the convergence may be provided as and expected transfer learning advantage score. Such score may be zero if convergence is not expected.

By the first node 111 receiving the information on the convergence of the recommended first predictive model of the first event with second observed data of the second event in this Action 207, the first node 111 may be enabled to assess if further training of the determined machine-learning model used for the determining in Action 202 of the one or more first predictive models may be necessary in order to further increase the accuracy of the determined machine-learning model, and thereby improve the accuracy of the recommendation system in providing a recommendation that results in predicting the second event with optimized accuracy, while optimizing use of time and compute resources.

Action 208

In embodiments wherein the recommendation may be to use a recommended first predictive model of the first event, in this Action 208, the first node 111 may retrain the machine-learning model used for the determining in Action 202 of the one or more first predictive models, based on the received information.

By the first node 111 in this Action 207, retraining the machine-learning model used for the determining in Action 202 of the one or more first predictive models, based on the received information in Action 207, the first node 111 may be enabled to assess if further training of the determined machine-learning model may be necessary in order to further increase the accuracy of the determined machine-learning model, and thereby improve the accuracy of the recommendation system in providing a recommendation that results in predicting the second event with optimized accuracy, while optimizing use of time and compute resources .

Further, a change in configuration, an addition of sites and/or cells, for example, may make models trained on previous data less useful, and new data such as the second set of data, may not be sufficient for building an accurate model. Hence, by performing Action 207 and Action 208, the machine-learning model used for the determining in Action 202 of the one or more first predictive models may be generalizable, and e.g., continuously, updatable, making any necessary adjustments based on e.g., data drift.

Embodiments of a computer-implemented method, performed by the second node 112, will now be described with reference to the flowchart depicted in Figure 3. The method may be understood to be for handling the predictive models. The second node 112 operates in the communications system 100. The method may comprise the actions described below. In some embodiments, all the actions may be performed. In other embodiments, some of the actions may be performed. One or more embodiments may be combined, where applicable. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. All possible combinations are not described to simplify the description. A nonlimiting example of the method performed by the second node 112 is depicted in Figure 3. In Figure 3, optional actions in some embodiments may be represented with dashed lines. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here to simplify the description. For example, the second event may be a performance of a key performance indicator of a radio access network.

Action 301

In this Action 301, the second node 112 receives, from the first node 111 operating in the communications system 100, the recommendation of whether or not to use transfer learning to predict the second event. The recommendation is based on, out of the plurality of predictive models of the plurality of events, the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event. The one or more first predictive models of the first event, to be used as the source model in transfer learning to predict the second event, are determined using machine-learning and based on a respective set of characteristics of each predictive model in the plurality.

The recommendation is also based on, out of the plurality of predictive models of the plurality of events, the respective expected benefit of each of the determined one or more first predictive models in predicting the second event.

The receiving may be performed e.g., via the first link 151.

In some embodiments, the recommendation may further indicate at least one of: a) each determined one or more first predictive models, and b) the respective expected benefit of each of the determined one or more first predictive models.

In some embodiments, with the proviso that the respective expected benefit exceeds the threshold, the recommendation may be to use the respective determined first predictive model. With the proviso that the respective expected benefit does not exceed the threshold, the recommendation may be to refrain from using transfer learning and to determine a new predictive model for the second event. Action 302

In this Action 302, the second node 112 initiates predicting the second event based on the received recommendation.

Initiating may be understood as starting, triggering, facilitating or enabling.

Action 303

In this Action 303, the second node 112 may determine, based on first observed data on the second event, the error of prediction of the second event of at least one of the determined one or more first predictive models.

Action 304

In this Action 304, the second node 112 may send the first indication of the determined error to the first node 111.

Action 305

In embodiments wherein the recommendation may be to use the recommended first predictive model of the first event, in this Action 305, the second node 112 may determine the information on the convergence of the recommended first predictive model of the first event with second observed data of the second event.

Action 306

In embodiments wherein the recommendation may be to use the recommended first predictive model of the first event, in this Action 306, the second node 112 may send, to the first node 111, the information on the convergence of the recommended first predictive model of the first event with second observed data of the second event.

Embodiments of a computer-implemented method, performed by the communications system 100, will now be described with reference to the flowchart depicted in Figure 4. The method may be understood to be for handling the predictive models. The communications system 100 comprises the first node 111 and the second node 112.

The method may comprise the actions described below. In some embodiments, all the actions may be performed. In other embodiments, some of the actions may be performed. One or more embodiments may be combined, where applicable. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. All possible combinations are not described to simplify the description. A nonlimiting example of the method performed by the communications system 100 is depicted in Figure 4. In Figure 4, optional actions in some embodiments may be represented with dashed lines. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here to simplify the description. For example, the second event may be a performance of a key performance indicator of a radio access network.

Action 401

This Action 401, which corresponds to Action 201, comprises obtaining 401, by the first node 111, the plurality of predictive models of the plurality of events, and the respective set of characteristics of each predictive model in the plurality.

Action 402

This Action 402, which corresponds to Action 202, comprises determining 402, by the first node 111, using machine-learning and the respective set of characteristics, out of the plurality: i) the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event, and ii) the respective expected benefit of each of the determined one or more first predictive models in predicting the second event.

In some embodiments, the determining in this Action 402 of the one or more first predictive models of the first event may comprise training the machine-learning model to select one or more predictive models of one event to predict another event, based on the feedback provided on the convergence of the selected one or more predictive models with observed data of the another event.

Action 403

In some embodiments, the method may further comprise, in this Action 403, which corresponds to Action 203, determining 403, by the first node 111, whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models is above the threshold. With the proviso that the respective expected benefit exceeds the threshold, the recommendation may be to use the respective determined first predictive model. With the proviso that the respective expected benefit does not exceed the threshold, the recommendation may be to refrain from using transfer learning and to determine a new predictive model for the second event.

Action 404

This Action 404, which corresponds to Action 304, comprises, initiating, by the first node 111, providing the recommendation, to the second node 112 operating in the communications system 100, of whether or not to use transfer learning to predict the second event. The recommendation is based on the determined one or more first predictive models and the respective expected benefit.

Action 405

This Action 405, which corresponds to Action 301 comprises, receiving, by the second node 112, the recommendation.

Action 406

This Action 406, which corresponds to Action 302, comprises initiating, by the second node 112, predicting the second event based on the received recommendation.

Action 407

In some embodiments, the method may further comprise, in this Action 407, which corresponds to Action 303, determining, by the second node 112, based on the first observed data on the second event, the error of prediction of the second event of at least one of the determined one or more first predictive models.

Action 408

In some embodiments, the method may further comprise, in this Action 408, which corresponds to Action 304, comprises sending, by the second node 112, the first indication of the determined error to the first node 111.

Action 409

In some embodiments, the method may further comprise, in this Action 409, which corresponds to Action 205, determining, by the first node 111, based on first observed data on the second event, the error of prediction of the second event of at least one of the determined one or more first predictive models.

Action 410

This Action 410, which corresponds to Action 206 comprises, retraining, by the first node 111, the machine-learning model used for the determining 402 of the one or more first predictive models, based on the determined error.

Action 411

In some embodiments wherein the recommendation may be to use the recommended first predictive model of the first event, the method may further comprise, in this his Action 411, which corresponds to Action 305, comprises determining, by the second node 112, the information on the convergence of the recommended first predictive model of the first event with second observed data of the second event.

Action 412

In some embodiments wherein the recommendation may be to use the recommended first predictive model of the first event, the method may further comprise, in this Action 412, which corresponds to Action 306, sending, by the second node 112, to the first node 111, the information on the convergence of the recommended first predictive model of the first event with second observed data of the second event.

Action 413

In some embodiments wherein the recommendation may be to use the recommended first predictive model of the first event, the method may further comprise, in this Action 413, which corresponds to Action 207, comprises receiving, by the first node 111, the information from the second node 112.

Action 414

In some embodiments wherein the recommendation may be to use the recommended first predictive model of the first event, the method may further comprise, in this Action 414, which corresponds to Action 208, comprises retraining 414, by the first node 111, the machinelearning model used for the determining 402 of the one or more first predictive models, based on the received information.

Figure 5 is a schematic block diagram depicting a non-limiting example of a method performed by the first node 111 according to embodiments herein. Particularly, Figure 5 depicts non-limiting implementation details of the inference phase for model development. In this particular non-limiting example, the second event is a KPI in a target cell. As depicted in Figure 5, the inputs may be: as depicted at 401, a) KPI related data at the target cell for e.g., downlink user throughput, accessibility, or retainability, and b) meta data for the target cell, e.g., taking as example forecasting, the meta data may comprise: i) configuration: frequency band, e.g., an indicator of coverage or capacity based on the cell, and ii) operational data: historical traffic load on the cell, site type, e.g., urban, rural, semi urban, indoor, outdoor etc.., PRB Utilisation, SINR. As depicted at 402, the input may also comprise c) task level configurations such as number of steps into future for forecast, number of days of data available at target model, backtracking number of points to be used from history for forecast. The inputs are obtained by the first node 111, which then uses the trained DL based recommender system, according to Action 202, to output Source Model 1 + Score of TL Advantage of using first predictive model 1, Source Model 2 + Score of TL Advantage of using first predictive model 2, etc.... Based on the configured threshold, the first node 111 may the recommend, according to Action 203 and Action 204, training from scratch or using TL with the best source model.

Figure 6 is a schematic block diagram depicting a non-limiting example of a method performed by the first node 111 according to embodiments herein. Particularly, Figure 6 depicts non-limiting implementation details of the training phase for model development. In this particular non-limiting example, the second event is a KPI in a target cell. As depicted in Figure 6, the inputs may be: as depicted at 501, a) KPI related data of many entities for e.g., downlink user throughput, accessibility, or retainability, and b) meta data for the many entities, e.g., taking as example forecasting, the meta data may comprise: i) configuration: frequency band, e.g., an indicator of coverage or capacity based on the many entities, and ii) operational data: historical traffic load on the many entities, site type, e.g., urban, rural, semi urban, indoor, outdoor etc.., PRB Utilisation, SINR. As depicted at 402, the input may also comprise c) task level configurations such as number of steps into future for forecast, number of days of data available at target model, backtracking number of points to be used from history for forecast. The inputs are obtained by the first node 111, which then uses them to train the DL based recommender system, according to Action 202, to output Source Model 1 + Score of TL Advantage of using first predictive model 1, Source Model 2 + Score of TL Advantage of using first predictive model 2, etc. . . . Figure 6 also illustrates when to trigger for retraining of the system in Action 208 and Action 206, based, respectively, on the feedback from the users received according to Action 207 and loss values, determined according with Action 205.

Non-limiting illustrative examples of experiments on Transfer Learning (TL)

In the next two figures, Figure 7 and Figure 8, two different examples of experimental data are provided to illustrate that source selection is important for the KPI prediction use case. By applying the transfer learning, it may be possible to learn from only the data that may be available at a target cell, with different algorithmic approaches and ensemble. When the source model is not chosen properly, transfer learning may have an adverse effect. The second event that may be used as target for TL based experiments may be a KPI, such as DL User throughput(Mbps), Accessibility(%),Retainability(%). To collect the data on these second events, based on feature selection, data on the following feature selection on KPIs may be collected at the target cell: Average (Avg) DL PRB Utilization (%), Avg Number of DL Active Users, Packet Data Convergence Protocol (PDCP) DL Data Volume (GB), Downlink Packet Loss Rate (%), Uplink Packet Loss Rate (%), Intra LTE Handover per Call, Intra Frequency Handover Success Rate (%), Physical Uplink Control Channel (PUCCH) Signal-to- Interference-plus-Noise Ratio (SINR) Ue Context Abnormal Release Rate (%),Cell Availability (%), Avg Number of Radio Resource Control (RRC) Connected Users, SI Signalling Setup Success Rate (%), Avg DL PRB Utilization (%), Avg Physical Downlink Control Channel (PDCCH) Control Channel Element (CCE) Utilization (%), and E-UTRAN Radio Access Bearer (ERAB) attempts. Taking as example a forecasting use case, the complexity of the initial solution, that is, the number of experiments that may need to be conducted to generate the training data during a part of the method wherein data may be generated for various source-target cell pairs may involve the following elements. As a first element, three types of cells, as to coverage, capacity, and other types, may need to be selected, based on Subject Matter Expert (SME) feedback. Cells may be usually classified as purely for coverage, e.g., on a highway, purely for capacity, e.g., in a city centre. The goal may be understood to be to indicate the kind of variations that may be used to create the training data. As a second element, three clusters per type of cell type may be observed in data in general. Based on experiments, not all cells of a same type may benefit from a same type of source selection. To make the recommender system more accurate and robust, all these types may need to be selected in training data. Within each cell type, three different clusters may need to be selected, and source and targets may need to be selected from those. Without these in the training data, the recommender system may be less accurate. As a third element, two cells per cluster, which may be understood to be two source models. Having two source models may help to capture the variability that there may be per source model in each type and cluster, e.g., as opposed to having only one, where recommender system would not be able to learn any variability. As a fourth element, ten cells may be randomly selected per cluster for the target models. For the aforementioned scenario, the total number of datapoints that may be generated may be 43,740 training data-points, that is, 43740 total training model combinations. The detailed calculation is: a) the number of source cell models is 3 (type-of-cells) * 3 (cluster per cell type) * 2 (cells-per-cluster) =18, b) the number of target cells is 3 (type-of-cells) * 3 (cluster per cell type) * 10 (cells-per-cluster) = 90, and c) the number of configuration combinations = 3 (different forecast windows) * 3 (different rolling windows) * 3 (different data setup). Therefore, the training data may be determined for 18 (source models) * 90 (target cells) * 27 (config combinations) = 43740 total training model combinations. According to the foregoing, the input feature vector for the model architecture size may be 24(rolling window size) * 4 = 96. Model architecture may be understood to refer to a blueprint of parameters and computations which may transform input to output. In a deep learning model, there may be understood to be multiple layers of parameters. The starting layer may be referred to as the input layer. The middle layers may be understood to be hidden layers, and the last layer may be understood to be the output layer. Input layer size may give an idea of how complex the model may be. The input layer dimension of the model in the example provided may be 24*4 = 96, 24 being the number of timesteps that may be checked in the back for time series and 4 being the number of features being chosen for forecasting a single KPI, which may then result in the input dimension being 96. There may be different model architectures, each having a trade-off between time and speed. The architecture may be chosen based on the accuracy and the speed required. Based on that choice, the model sizes may range from 225KB to 2.6MB for cell level models, and 7MB to 20MB for market level models, depending on choice of model architecture. Trainable parameters may be understood to be parameters of the model that may be be trained/learned. The trainable parameters in the model may range from 29000 to 200000 parameters, depending on the choice of model architecture.

Figure 7 is a graphical representation of a comparison of a model performance of forecasting a second event with empirical data, which is here downlink user throughput in Mbps at a cell in a first location, with and without using the models built for different cells at a second, different, location. Figure 7 particularly illustrates an example wherein transfer learning is beneficial. The data available at the first location, represented as “No TL” for no transfer learning, is 14 days and it was 62 days for cells at the second location. The X-axis shows epochs and the Y-axis shows validation Mean squared Error (MSE). The trained model takes the previous 24 Hours as input, and predicts the next hour value of KPI. The line denoted as “No TL (14 days)” shows performance of training of a model from scratch. The other lines, FNL09402_9A_l (62 days), FNL02069_9B_l (62 days) and FNL09011_7A_l (62 days), show a model trained using transfer learning with different source models. As may be observed in Figure 7 the model trained from scratch starts at higher loss, and performance may be seen to be inferior to the other models. All models trained using transfer learning have a better start, better performance and faster convergence.

Figure 8 is another graphical representation of a comparison of a model performance of forecasting a second event with empirical data, which is here also downlink user throughput in Mbps at a cell in a first location, with and without using the models built for another cells at a second, different, location. Figure 8 particularly illustrates an example wherein transfer learning is not beneficial. The line denoted as “No TL (14 days)” shows performance of training of a model from scratch. The line denoted as “FNL02069 9B 1 (62 days)” shows training a model using transfer learning. In this case, training from scratch may be seen to have better performance and convergence, after some epochs.

The examples depicted in each of Figure 7 and Figure 8 illustrate the relevance of source model selection in order to achieve the advantages of transfer learning, as transfer learning may not always be useful.

Certain embodiments disclosed herein may provide one or more of the following technical advantage(s), which may be summarized as follows. A first advantage of embodiments herein may be understood to be that use of a recommender system such as that described herein may avoid the need to hand craft selection criteria for source model selection. The recommender system may enable to obtain a recommendation of suitable models from the set of source models.

As another advantage of embodiments herein, the recommender system of embodiments herein may recommend whether transfer leaning may add value to a specific case, as opposed to building a model from scratch, and may indicate if it may be helpful to apply transfer learning for networks only for the relevant cases. Without a recommender system, it may be understood too hard to select a correct source model for transfer learning. The recommender system may be understood to enable to provide the advantage in time and compute for many events.

As yet another advantage, in cases wherein there may be less data for the second event, e.g., a at the target cell, embodiments herein may be understood to provide the benefits in terms of compute power and time to convergence.

As a further advantage, since the concept of multiple source ML models may be used, embodiments herein may be understood to be able to be generalized to other domains for selection of a source machine-learning model that may be most useful to build a model for the second event. Embodiments have also been used, for example, on regression on KPI using MIMO configurations, and may be extended to further machine-learning use cases, e.g., which may work on RAN Data.

As yet another advantage, particular embodiments herein may enable to build effective models at a target location using input KPIs/counters, that is, features and input task, for specific output KPIs. Table 1 shows an example of the advantage of transfer learning with respect to time and energy saved at a cell level for a forecasting model:

Table 1.

If a generalizable solution is desired that may take into account different hyperparameters of rolling window e.g., three different rolling windows, number of timesteps into future, e.g., three different timesteps, then this may result in training 30K *9 models, for 30K cells where a model may be developed on per cell basis. For nine market areas, the training combinations may be 30K * 9 * 9 = 2.43 M combinations. Thus, the total training time, without use of transfer learning, may be 2.43M * 360 seconds = 874M seconds. Whereas a recommender system such as that described herein for the first node 111 may only require collecting training data for 43K models, and reduced training time for subsequent training.

Figure 9 depicts two different examples in panels a) and b), respectively, of the arrangement that the first node 111 may comprise to perform the method actions described above in relation to Figure 2, and/or Figures 4-8. In some embodiments, the first node 111 may comprise the following arrangement depicted in Figure 9a. The first node 111 may be understood to be for handling the predictive models. The first node 111 is configured to operate in the communications system 100.

Several embodiments are comprised herein. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. In Figure 9, optional boxes are indicated by dashed lines. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here. For example, the second event may be configured to be a performance of a key performance indicator of a radio access network. The first node 111 is configured to, e.g. by means of an obtaining unit 901 within the first node 111 configured to, obtain the plurality of predictive models of the plurality of events, and the respective set of characteristics of each predictive model in the plurality.

The first node 111 is also configured to, e.g. by means of a determining unit 902 within the first node 111 configured to, determine, using machine-learning and the respective set of characteristics, out of the plurality: i) the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event, and ii) the respective expected benefit of each of the one or more first predictive models configured to be determined, in predicting the second event.

The first node 111 is further configured to, e.g. by means of an initiating unit 903 within the first node 111 configured to, initiate providing the recommendation, to the second node 112 configured to operate in the communications system 100, of whether or not to use transfer learning to predict the second event. The recommendation is configured to be based on the one or more first predictive models and the respective expected benefit configured to be determined.

In some embodiments, the recommendation may be further configured to indicate at least one of: a) each one or more first predictive models configured to be determined, and b) the respective expected benefit of each of the one or more first predictive models configured to be determined.

In some embodiments, the determining of the one or more first predictive models of the first event may be configured to comprise training the machine-learning model to select the one or more predictive models of one event to predict another event, based on the feedback configured to be provided on the convergence of the one or more predictive models configured to be selected with observed data of the another event.

In some embodiments, the first node 111 may be further configured to, e.g. by means of the determining unit 902 within the first node 111 configured to, determine, based on the first observed data on the second event, the error of prediction of the second event of at least one of the one or more first predictive models configured to be determined.

In some embodiments, the first node 111 may be further configured to, e.g. by means of a retraining unit 904 within the first node 111 configured to, retrain the machine-learning model used for the determining of the one or more first predictive models, based on the error configured to be determined. In some embodiments wherein the recommendation may be to use the recommended first predictive model of the first event, the first node 111 may be further configured to, e.g. by means of a receiving unit 905 within the first node 111 configured to, receive, from the second node 112, information on the convergence of the recommended first predictive model of the first event with second observed data of the second event.

In some embodiments, wherein the recommendation is to use a recommended first predictive model of the first event, the first node 111 may be further configured to, e.g. by means of the retraining unit 904 within the first node 111 configured to, retrain the machinelearning model used for the determining of the one or more first predictive models, based on the information configured to be received.

In some embodiments, the first node 111 may be further configured to, e.g. by means of the determining unit 902 within the first node 111 configured to, determine whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models may be above the threshold. With the proviso that the respective expected benefit exceeds the threshold, the recommendation may be configured to be to use the respective determined first predictive model. With the proviso that the respective expected benefit does not exceed the threshold, the recommendation may be configured to be to refrain from using transfer learning and to determine the new predictive model for the second event.

The embodiments herein may be implemented through one or more processors, such as a processor 906 in the first node 111 depicted in Figure 9, together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the in the first node 111. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the first node 111.

The first node 111 may further comprise a memory 907 comprising one or more memory units. The memory 907 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the first node 111.

In some embodiments, the first node 111 may receive information from, e.g., the second node 112, the radio network node 130, the device 140, and/or another node through a receiving port 908. In some examples, the receiving port 908 may be, for example, connected to one or more antennas in the first node 111. In other embodiments, the first node 111 may receive information from another structure in the communications system 100 through the receiving port 908. Since the receiving port 908 may be in communication with the processor 906, the receiving port 908 may then send the received information to the processor 906. The receiving port 908 may also be configured to receive other information.

The processor 906 in the first node 111 may be further configured to transmit or send information to e.g., the second node 112, the radio network node 130, the device 140, another node, and/or another structure in the communications system 100, through a sending port 909, which may be in communication with the processor 906, and the memory 907.

Those skilled in the art will also appreciate that any of the units 901-905 described above may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processor 906, perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application- Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).

Any of the units 901-905 described above may be the processor 906 of the first node 111, or an application running on such processor.

Thus, the methods according to the embodiments described herein for the first node 111 may be respectively implemented by means of a computer program 910 product, comprising instructions, i.e., software code portions, which, when executed on at least one processor 906, cause the at least one processor 906 to carry out the actions described herein, as performed by the first node 111. The computer program 910 product may be stored on a computer- readable storage medium 911. The computer-readable storage medium 911, having stored thereon the computer program 910, may comprise instructions which, when executed on at least one processor 906, cause the at least one processor 906 to carry out the actions described herein, as performed by the first node 111. In some embodiments, the computer-readable storage medium 911 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, a memory stick, or stored in the cloud space. In other embodiments, the computer program 910 product may be stored on a carrier containing the computer program, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 911, as described above.

The first node 111 may comprise an interface unit to facilitate communications between the first node 111 and other nodes or devices, e.g., the second node 112, the radio network node 130, the device 140, another node, and/or another structure in the communications system 100. In some particular examples, the interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.

In other embodiments, the first node 111 may comprise the following arrangement depicted in Figure 9b. The first node 111 may comprise a processing circuitry 906, e.g., one or more processors such as the processor 906, in the first node 111 and the memory 907. The first node 111 may also comprise a radio circuitry 912, which may comprise e.g., the receiving port 908 and the sending port 909. The processing circuitry 906 may be configured to, or operable to, perform the method actions according to Figure 2, and/or Figures 4-8, in a similar manner as that described in relation to Figure 9a. The radio circuitry 912 may be configured to set up and maintain at least a wireless connection with the second node 112, the radio network node 130, the device 140, another node, and/or another structure in the communications system 100.

Hence, embodiments herein also relate to the first node 111 operative for handling the anomalous event, the first node 111 being operative to operate in the communications system 100. The first node 111 may comprise the processing circuitry 906 and the memory 907, said memory 907 containing instructions executable by said processing circuitry 906, whereby the first node 111 is further operative to perform the actions described herein in relation to the first node 111, e.g., in Figure 2, and/or Figures 4-8.

Figure 10 depicts two different examples in panels a) and b), respectively, of the arrangement that the second node 112 may comprise to perform the method actions described above in relation to Figure 3, and/or Figures 7-8. In some embodiments, the second node 112 may comprise the following arrangement depicted in Figure 10a. The second node 112 may be understood to be for handling the predictive models. The second node 112 is configured to operate in the communications system 100.

Several embodiments are comprised herein. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. In Figure 10, optional boxes are indicated by dashed lines. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here. For example, the second event may be configured to be performance of a key performance indicator of a radio access network.

The second node 112 is configured to, e.g. by means of a receiving unit 1001 within the second node 112 configured to, receive, from the first node 111 configured to operate in the communications system 100, the recommendation. The recommendation is configured to be of whether or not to use transfer learning to predict a the second event. The recommendation is configured to be based on, out of the plurality of predictive models of the plurality of events: i) the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event, configured to be determined using machine-learning and based on the respective set of characteristics of each predictive model in the plurality, and ii) the respective expected benefit of each of the one or more first predictive models configured to be determined, in predicting the second event.

The second node 112 is also configured to, e.g. by means of an initiating unit 1002 within the second node 112 configured to, initiate predicting the second event based on the recommendation configured to be received.

The second node 112 may be also configured to, e.g. by means of a determining unit 1003 within the second node 112 configured to, determine, based on the first observed data on the second event, the error of prediction of the second event of at least one of the one or more first predictive models configured to be determined.

The second node 112 may be also configured to, e.g. by means of a sending unit 1004 within the second node 112 configured to, send the first indication of the error configured to be determined to the first node 111.

In some embodiments, wherein the recommendation may be configured to be to use a recommended first predictive model of the first event, the second node 112 may be also configured to, e.g. by means of the determining unit 1003 within the second node 112 configured to, determine the information on the convergence of the first predictive model of the first event configured to be recommended with second observed data of the second event. In some embodiments, wherein the recommendation may be configured to be to use a recommended first predictive model of the first event, the second node 112 may be also configured to, e.g. by means of the sending unit 1004 within the second node 112 configured to, send, to the first node 111, the information on the convergence of the first predictive model of the first event configured to be recommended with second observed data of the second event.

In some embodiments, with the proviso that the respective expected benefit exceeds the threshold, the received recommendation may be configured to be to use the respective determined first predictive model.

In some embodiments, with the proviso that the respective expected benefit does not exceed the threshold, the received recommendation may be configured to be to refrain from using transfer learning and to determine the new predictive model for the second event.

The embodiments herein may be implemented through one or more processors, such as a processor 1005 in the second node 112 depicted in Figure 10, together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the in the second node 112. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the second node 112.

The second node 112 may further comprise a memory 1006 comprising one or more memory units. The memory 1006 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the second node 112.

In some embodiments, the second node 112 may receive information from, e.g., the first node 111, the radio network node 130, the device 140, and/or another node, through a receiving port 1007. In some examples, the receiving port 1007 may be, for example, connected to one or more antennas in the second node 112. In other embodiments, the second node 112 may receive information from another structure in the communications system 100 through the receiving port 1007. Since the receiving port 1007 may be in communication with the processor 1005, the receiving port 1007 may then send the received information to the processor 1005. The receiving port 1007 may also be configured to receive other information. The processor 1005 in the second node 112 may be further configured to transmit or send information to e.g., the first node 111, the radio network node 130, the device 140, another node, and/or another structure in the communications system 100, through a sending port 1008, which may be in communication with the processor 1005, and the memory 1006.

Those skilled in the art will also appreciate that the units 1001-1002 described above may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processor 1005, perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application- Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).

The units 1001-1002 described above may be the processor 1005 of the second node 112, or an application running on such processor.

Thus, the methods according to the embodiments described herein for the second node 112 may be respectively implemented by means of a computer program 1009 product, comprising instructions, i.e., software code portions, which, when executed on at least one processor 1005, cause the at least one processor 1005 to carry out the actions described herein, as performed by the second node 112. The computer program 1009 product may be stored on a computer-readable storage medium 1010. The computer-readable storage medium 1010, having stored thereon the computer program 1009, may comprise instructions which, when executed on at least one processor 1005, cause the at least one processor 1005 to carry out the actions described herein, as performed by the second node 112. In some embodiments, the computer-readable storage medium 1010 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, a memory stick, or stored in the cloud space. In other embodiments, the computer program 1009 product may be stored on a carrier containing the computer program, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1010, as described above.

The second node 112 may comprise an interface unit to facilitate communications between the second node 112 and other nodes or devices, e.g., the first node 111, the radio network node 130, the device 140, another node, and/or another structure in the communications system 100. In some particular examples, the interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.

In other embodiments, the second node 112 may comprise the following arrangement depicted in Figure 10b. The second node 112 may comprise a processing circuitry 1005, e.g., one or more processors such as the processor 1005, in the second node 112 and the memory 1006. The second node 112 may also comprise a radio circuitry 1011, which may comprise e.g., the receiving port 1007 and the sending port 1008. The processing circuitry 1005 may be configured to, or operable to, perform the method actions according to Figure 3 and/or Figures 7-8, in a similar manner as that described in relation to Figure 10a. The radio circuitry 1011 may be configured to set up and maintain at least a wireless connection with the first node 111, the radio network node 130, the device 140, another node, and/or another structure in the communications system 100.

Hence, embodiments herein also relate to the second node 112 operative for handling the anomalous event, the second node 112 being operative to operate in the communications system 100. The second node 112 may comprise the processing circuitry 1005 and the memory 1006, said memory 1006 containing instructions executable by said processing circuitry 1005, whereby the second node 112 is further operative to perform the actions described herein in relation to the second node 112, e.g., in Figure 3 and/or Figures 7-8.

Figure 11 depicts two different examples in panels a) and b), respectively, of the arrangement that the communications system 100 may comprise to perform the method actions described above in relation to Figure 4 and/or Figures 5-8. The arrangement depicted in panel a) corresponds to that described in relation to panel a) in Figure 9 and Figure 10 for each of the first node 111 and the second node 112, respectively. The arrangement depicted in panel b) corresponds to that described in relation to panel b) in Figure 9 and Figure 10 for each of the first node 111 and the second node 112, respectively. The communications system 100 may be for handling the predictive models. The communications system 100 is configured to comprise the first node 111 and the second node 112.

Several embodiments are comprised herein. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. In Figure 11, optional boxes are indicated by dashed lines. The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here. For example, the second event may be configured to be performance of a key performance indicator of a radio access network.

The communications system 100 is configured to, e.g. by means of the obtaining unit 901 within the first node 111 configured to, obtain, by the first node 111, the plurality of predictive models of the plurality of events, and the respective set of characteristics of each predictive model in the plurality.

The communications system 100 is also configured to, e.g. by means of the determining unit 902 within the first node 111 configured to, determine, by the first node 111, using machine-learning and the respective set of characteristics, out of the plurality: i) the one or more first predictive models of the first event, of the plurality of events, to be used as the source model in transfer learning to predict the second event, and ii) the respective expected benefit of each of the one or more first predictive models configured to be determined, in predicting the second event.

The communications system 100 is configured to, e.g. by means of an initiating unit 903 within the first node 111 configured to, initiate, by the first node 111, providing the recommendation, to the second node 112 configured to operate in the communications system 100, of whether or not to use transfer learning to predict the second event. The recommendation is configured to be based on the one or more first predictive models and the respective expected benefit configured to be determined.

The communications system 100 is also configured to, e.g. by means of a receiving unit 1001 within the second node 112 configured to, receive, by the second node 112, from the first node 111 configured to operate in the communications system 100, the recommendation.

The communications system 100 is further configured to, e.g. by means of an initiating unit 1002 within the second node 112 configured to, initiate, by the second node 112, predicting the second event based on the recommendation configured to be received.

The communications system 100 may be configured to, e.g. by means of the determining unit 902 within the first node 111 configured to, determine, based on the first observed data on the second event, the error of prediction of the second event of at least one of the one or more first predictive models configured to be determined.

The communications system 100 may be configured to, e.g. by means of a retraining unit 904 within the first node 111 configured to, retrain, by the first node 111, the machinelearning model used for the determining of the one or more first predictive models, based on the error configured to be determined.

The communications system 100 may be configured to, e.g. by means of a determining unit 1003 within the second node 112 configured to, determine, by the second node 112, based on the first observed data on the second event, the error of prediction of the second event of at least one of the one or more first predictive models configured to be determined.

The communications system 100 may be configured to, e.g. by means of a sending unit 1004 within the second node 112 configured to, send, by the second node 112, the first indication of the error configured to be determined to the first node 111.

In some embodiments wherein the recommendation may be to use the recommended first predictive model of the first event, the communications system 100 may be further configured to, e.g. by means of a receiving unit 905 within the first node 111 configured to, receive, by the first node 111, from the second node 112, information on the convergence of the recommended first predictive model of the first event with second observed data of the second event.

In some embodiments, wherein the recommendation may be to use a recommended first predictive model of the first event, the communications system 100 may be further configured to, e.g. by means of the retraining unit 904 within the first node 111 configured to, retrain, by the first node 111, the machine-learning model used for the determining of the one or more first predictive models, based on the information configured to be received.

In some embodiments, the communications system 100 may be further configured to, e.g. by means of the determining unit 902 within the first node 111 configured to, determine, by the first node 111, whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models may be above the threshold. With the proviso that the respective expected benefit exceeds the threshold, the recommendation may be configured to be to use the respective determined first predictive model. With the proviso that the respective expected benefit does not exceed the threshold, the recommendation may be configured to be to refrain from using transfer learning and to determine the new predictive model for the second event.

In some embodiments, wherein the recommendation may be configured to be to use the recommended first predictive model of the first event, the communications system 100 may be also configured to, e.g. by means of the determining unit 1003 within the second node 112 configured to, determine, by the second node 112, the information on the convergence of the first predictive model of the first event configured to be recommended with second observed data of the second event.

In some embodiments, wherein the recommendation may be configured to be to use the recommended first predictive model of the first event, the communications system 100 may be also configured to, e.g. by means of the sending unit 1004 within the second node 112 configured to, send, to the first node 111, the information on the convergence of the first predictive model of the first event configured to be recommended with second observed data of the second event.

The methods according to the embodiments described herein for the communications system 100 may be respectively implemented by means of a computer program 1101 product, comprising instructions, i.e., software code portions, which, when executed on at least one processor 906, 1005, cause the at least one processor 906, 1005 to carry out the actions described herein, as performed by the communications system 100. The computer program

1101 product may be stored on a computer-readable storage medium 1102. The computer- readable storage medium 1102, having stored thereon the computer program 1101, may comprise instructions which, when executed on at least one processor 906, 1005, cause the at least one processor 906, 1005 to carry out the actions described herein, as performed by the communications system 100. In some embodiments, the computer-readable storage medium

1102 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, a memory stick, or stored in the cloud space. In other embodiments, the computer program 1101 product may be stored on a carrier containing the computer program, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1102, as described above.

The remaining configurations described for the first node 111 and the second node 112 in relation to Figure 11, may be understood to correspond to those described in Figure 9, and Figure 10, respectively, and to be performed, e.g., by means of the corresponding units and arrangements described in Figure 9 and Figure 10, which will not be repeated here.

When using the word "comprise" or “comprising”, it shall be interpreted as non- limiting, i.e. meaning "consist at least of".

The embodiments herein are not limited to the above-described preferred embodiments. Various alternatives, modifications and equivalents may be used. Therefore, the above embodiments should not be taken as limiting the scope of the invention.

Generally, all terms used herein are to be interpreted according to their ordinary meaning in the relevant technical field, unless a different meaning is clearly given and/or is implied from the context in which it is used. All references to a/an/the element, apparatus, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any methods disclosed herein do not have to be performed in the exact order disclosed, unless a step is explicitly described as following or preceding another step and/or where it is implicit that a step must follow or precede another step. Any feature of any of the embodiments disclosed herein may be applied to any other embodiment, wherever appropriate. Likewise, any advantage of any of the embodiments may apply to any other embodiments, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following description.

As used herein, the expression “at least one of:” followed by a list of alternatives separated by commas, and wherein the last alternative is preceded by the “and” term, may be understood to mean that only one of the list of alternatives may apply, more than one of the list of alternatives may apply or all of the list of alternatives may apply. This expression may be understood to be equivalent to the expression “at least one of:” followed by a list of alternatives separated by commas, and wherein the last alternative is preceded by the “or” term.

Any of the terms processor and circuitry may be understood herein as a hardware component.

As used herein, the expression “in some embodiments” has been used to indicate that the features of the embodiment described may be combined with any other embodiment or example disclosed herein. As used herein, the expression “in some examples” has been used to indicate that the features of the example described may be combined with any other embodiment or example disclosed herein.

Claims

CLAIMS:

1. A computer- implemented method, performed by a first node (111), the method being for handling predictive models, the first node (111) operating in a communications system (100), the method comprising:

- obtaining (201) a plurality of predictive models of a plurality of events, and a respective set of characteristics of each predictive model in the plurality,

- determining (202), using machine-learning and the respective set of characteristics, out of the plurality: i. one or more first predictive models of a first event, of the plurality of events, to be used as a source model in transfer learning to predict a second event, and ii. a respective expected benefit of each of the determined one or more first predictive models in predicting the second event, and

- initiating (204) providing a recommendation, to a second node (112) operating in the communications system (100), of whether or not to use transfer learning to predict the second event, the recommendation being based on the determined one or more first predictive models and the respective expected benefit.

2. The method according to claim 1, wherein the recommendation further indicates at least one of: a. each determined one or more first predictive models, and b. the respective expected benefit of each of the determined one or more first predictive models.

3. The method according to any of claims 1-2, wherein the determining (202) of the one or more first predictive models of the first event comprises training a machine-learning model to select one or more predictive models of one event to predict another event, based on feedback provided on a convergence of the selected one or more predictive models with observed data of the another event.

4. The method according to claim 3, further comprising:

- determining (205), based on first observed data on the second event, an error of prediction of the second event of at least one of the determined one or more first predictive models, and

48 - retraining (206) the machine-learning model used for the determining (202) of the one or more first predictive models, based on the determined error. The method according to any of claims 3-4, wherein the recommendation is to use a recommended first predictive model of the first event, and wherein the method further comprises:

- receiving (207), from the second node (112), information on a convergence of the recommended first predictive model of the first event with second observed data of the second event, and

- retraining (208) the machine-learning model used for the determining (202) of the one or more first predictive models, based on the received information. The method according to any of claims 1-5, further comprising:

- determining (203) whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models is above a threshold, and wherein: i. with the proviso that a respective expected benefit exceeds the threshold, the recommendation is to use the respective determined first predictive model, and ii. with the proviso that the respective expected benefit does not exceed the threshold, the recommendation is to refrain from using transfer learning and to determine a new predictive model for the second event. The method according to any of claims 1-6, wherein the second event is a performance of a key performance indicator of a radio access network. A computer- implemented method, performed by a second node (112), the method being for handling predictive models, the second node (112) operating in a communications system (100), the method comprising:

- receiving (301), from a first node (111) operating in the communications system (100), a recommendation of whether or not to use transfer learning to predict a second event, the recommendation being based on, out of a plurality of predictive models of a plurality of events:

49 i. one or more first predictive models of a first event, of the plurality of events, to be used as a source model in transfer learning to predict the second event, determined using machine-learning and based on a respective set of characteristics of each predictive model in the plurality, and ii. a respective expected benefit of each of the determined one or more first predictive models in predicting the second event, and

- initiating (302) predicting the second event based on the received recommendation. The method according to claim 8, wherein the recommendation further indicates at least one of: a. each determined one or more first predictive models, and b. the respective expected benefit of each of the determined one or more first predictive models. The method according to any of claims 8-9, further comprising:

- determining (303), based on first observed data on the second event, an error of prediction of the second event of at least one of the determined one or more first predictive models, and

- sending (304) a first indication of the determined error to the first node (111). The method according to any of claims 8-10, wherein the recommendation is to use a recommended first predictive model of the first event, and wherein the method further comprises:

- determining (305) information on a convergence of the recommended first predictive model of the first event with second observed data of the second event, and

- sending (306), to the first node (111), information on a convergence of the recommended first predictive model of the first event with second observed data of the second event. The method according to any of claims 8-11, wherein:

50 i. with the proviso that a respective expected benefit exceeds a threshold, the received recommendation is to use the respective determined first predictive model, and ii. with the proviso that the respective expected benefit does not exceed the threshold, the received recommendation is to refrain from using transfer learning and to determine a new predictive model for the second event. The method according to any of claims 8-12, wherein the second event is a performance of a key performance indicator of a radio access network. A computer- implemented method, performed by a communications system (100), the method being for handling predictive models, the communications system (100) comprising a first node (111) and a second node (112), the method comprising:

- obtaining (401), by the first node (111), a plurality of predictive models of a plurality of events, and a respective set of characteristics of each predictive model in the plurality,

- determining (402), by the first node (111), using machine-learning and the respective set of characteristics, out of the plurality: i. one or more first predictive models of a first event, of the plurality of events, to be used as a source model in transfer learning to predict a second event, and ii. a respective expected benefit of each of the determined one or more first predictive models in predicting the second event,

- initiating (404), by the first node (111), providing a recommendation, to a second node (112) operating in the communications system (100), of whether or not to use transfer learning to predict the second event, the recommendation being based on the determined one or more first predictive models and the respective expected benefit,

- receiving (405), by the second node (112), the recommendation, and

- initiating (406), by the second node (112), predicting the second event based on the received recommendation. The method according to claim 14, wherein the recommendation further indicates at least one of:

51 a. each determined one or more first predictive models, and b. the respective expected benefit of each of the determined one or more first predictive models.

16. The method according to any of claims 14-15, wherein the determining (402) of the one or more first predictive models of the first event comprises training a machine-learning model to select one or more predictive models of one event to predict another event, based on feedback provided on a convergence of the selected one or more predictive models with observed data of the another event.

17. The method according to claim 16, further comprising:

- determining (409), by the first node (111), based on first observed data on the second event, an error of prediction of the second event of at least one of the determined one or more first predictive models, and

- retraining (410), by the first node (111), the machine-learning model used for the determining (402) of the one or more first predictive models, based on the determined error.

18. The method according to any of claims 16-17, wherein the recommendation is to use a recommended first predictive model of the first event, and wherein the method further comprises:

- determining (411), by the second node (112), information on a convergence of the recommended first predictive model of the first event with second observed data of the second event,

- sending (412), by the second node (112), to the first node (111), information on a convergence of the recommended first predictive model of the first event with second observed data of the second event,

- receiving (413), by the first node (111), the information from the second node (112), and

- retraining (414), by the first node (111), the machine-learning model used for the determining (402) of the one or more first predictive models, based on the received information.

19. The method according to any of claims 14-19, further comprising: - determining (403), by the first node (111), whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models is above a threshold, and wherein: i. with the proviso that a respective expected benefit exceeds the threshold, the recommendation is to use the respective determined first predictive model, and ii. with the proviso that the respective expected benefit does not exceed the threshold, the recommendation is to refrain from using transfer learning and to determine a new predictive model for the second event.

20. The method according to any of claims 14-19, wherein the second event is a performance of a key performance indicator of a radio access network.

21. The method according to any of claims 14-20, further comprising:

- determining (407), by the second node (112), based on first observed data on the second event, an error of prediction of the second event of at least one of the determined one or more first predictive models, and

- sending (408), by the second node (112), a first indication of the determined error to the first node (111).

22. A first node (111), for handling predictive models, the first node (111) being configured to operate in a communications system (100), the first node (111) being further configured to:

- obtain a plurality of predictive models of a plurality of events, and a respective set of characteristics of each predictive model in the plurality,

- determine, using machine-learning and the respective set of characteristics, out of the plurality: i. one or more first predictive models of a first event, of the plurality of events, to be used as a source model in transfer learning to predict a second event, and ii. a respective expected benefit of each of the one or more first predictive models configured to be determined, in predicting the second event, and

- initiate providing a recommendation, to a second node (112) configured to operate in the communications system (100), of whether or not to use transfer learning to predict the second event, the recommendation being configured to be based on the one or more first predictive models and the respective expected benefit configured to be determined. The first node (111) according to claim 22, wherein the recommendation is further configured to indicate at least one of: a. each one or more first predictive models configured to be determined, and b. the respective expected benefit of each of the one or more first predictive models configured to be determined. The first node (111) according to any of claims 22-23, wherein the determining (202) of the one or more first predictive models of the first event is configured to comprise training a machine-learning model to select one or more predictive models of one event to predict another event, based on feedback configured to be provided on a convergence of the one or more predictive models configured to be selected with observed data of the another event. The first node (111) according to claim 24, further configured to:

- determine, based on first observed data on the second event, an error of prediction of the second event of at least one of the one or more first predictive models configured to be determined, and

- retrain the machine-learning model used for the determining (202) of the one or more first predictive models, based on the error configured to be determined. The first node (111) according to any of claims 24-25, wherein the recommendation is to use a recommended first predictive model of the first event, and wherein the first node (111) is further configured to:

- receive, from the second node (112), information on a convergence of the recommended first predictive model of the first event with second observed data of the second event, and

- retrain the machine-learning model used for the determining of the one or more first predictive models, based on the information configured to be received. The first node (111) according to any of claims 22-26, being further configured to:

54 - determine (203) whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models is above a threshold, and wherein: i. with the proviso that a respective expected benefit exceeds the threshold, the recommendation is configured to be to use the respective determined first predictive model, and ii. with the proviso that the respective expected benefit does not exceed the threshold, the recommendation is configured to be to refrain from using transfer learning and to determine a new predictive model for the second event. The first node (111) according to any of claims 22-27, wherein the second event is configured to be a performance of a key performance indicator of a radio access network. A second node (112), for handling predictive models, the second node (112) being configured to operate in a communications system (100), the second node (112) being further configured to:

- receive, from a first node (111) configured to operate in the communications system (100), a recommendation of whether or not to use transfer learning to predict a second event, the recommendation being configured to be based on, out of a plurality of predictive models of a plurality of events: i. one or more first predictive models of a first event, of the plurality of events, to be used as a source model in transfer learning to predict the second event, configured to be determined using machine-learning and based on a respective set of characteristics of each predictive model in the plurality, and ii. a respective expected benefit of each of the one or more first predictive models configured to be determined, in predicting the second event, and

- initiate predicting the second event based on the recommendation configured to be received. The second node (112) according to claim 29, wherein the recommendation is further configured to indicate at least one of: a. each one or more first predictive models configured to be determined, and

55 b. the respective expected benefit of each of the one or more first predictive models configured to be determined. The second node (112) according to any of claims 29-30, being further configured to:

- send a first indication of the error configured to be determined to the first node (H l). The second node (112) according to any of claims 29-31, wherein the recommendation is to configured to be to use a recommended first predictive model of the first event, and wherein the second node (112) is further configured to:

- determine information on a convergence of the first predictive model of the first event configured to be recommended with second observed data of the second event, and

- send, to the first node (111), information on a convergence of the first predictive model of the first event configured to be recommended with second observed data of the second event. The second node (112) according to any of claims 29-32, wherein: i. with the proviso that a respective expected benefit exceeds a threshold, the received recommendation is configured to be to use the respective determined first predictive model, and ii. with the proviso that the respective expected benefit does not exceed the threshold, the received recommendation is configured to be to refrain from using transfer learning and to determine a new predictive model for the second event. The second node (112) according to any of claims 29-33, wherein the second event is configured to be a performance of a key performance indicator of a radio access network. A communications system (100), for handling predictive models, the communications system (100) being configured to comprise a first node (111) and a second node (112), the communications system (100) being further configured to:

56 - obtain, by the first node (111), a plurality of predictive models of a plurality of events, and a respective set of characteristics of each predictive model in the plurality,

- determine, by the first node (111), using machine-learning and the respective set of characteristics, out of the plurality: i. one or more first predictive models of a first event, of the plurality of events, to be used as a source model in transfer learning to predict a second event, and ii. a respective expected benefit of each of the one or more first predictive models configured to be determined in predicting the second event,

- initiate (404), by the first node (111), providing a recommendation, to a second node (112) configured to operate in the communications system (100), of whether or not to use transfer learning to predict the second event, the recommendation being configured to be based on the one or more first predictive models and the respective expected benefit configured to be determined,

- receive, by the second node (112), the recommendation, and

- initiate, by the second node (112), predicting the second event based on the recommendation configured to be received.

36. The communications system (100) according to claim 35, wherein the recommendation is further configured to indicate at least one of: a. each determined one or more first predictive models, and b. the respective expected benefit of each of the one or more first predictive models configured to be determined.

37. The communications system (100) according to any of claims 35-36, wherein the determining of the one or more first predictive models of the first event is configured to comprise training a machine-learning model to select one or more predictive models of one event to predict another event, based on feedback provided on a convergence of the one or more predictive models configured to be selected with observed data of the another event.

38. The communications system (100) according to claim 37, being further configured to:

57 - determine, by the first node (111), based on first observed data on the second event, an error of prediction of the second event of at least one of the one or more first predictive models configured to be determined, and

- retrain, by the first node (111), the machine-learning model configured to be used for the determining of the one or more first predictive models, based on the error configured to be determined. The communications system (100) according to any of claims 37-38, wherein the recommendation is configured to be to use a recommended first predictive model of the first event, and wherein the communications system (100) is further configured to:

- determine, by the second node (112), information on a convergence of the first predictive model of the first event configured to be recommended with second observed data of the second event,

- send, by the second node (112), to the first node (111), information on a convergence of the first predictive model of the first event configured to be recommended with second observed data of the second event,

- receive, by the first node (111), the information from the second node (112), and

- retrain, by the first node (111), the machine-learning model configured to be used for the determining of the one or more first predictive models, based on the information configured to be received. The communications system (100) according to any of claims 35-39, being further configured to:

- determine, by the first node (111), whether or not each of the determined respective expected benefits of each of the determined one or more first predictive models is above a threshold, and wherein: i. with the proviso that a respective expected benefit exceeds the threshold, the recommendation is configured to be to use the respective determined first predictive model, and ii. with the proviso that the respective expected benefit does not exceed the threshold, the recommendation is configured to be to refrain from using transfer learning and to determine a new predictive model for the second event.

58

41. The communications system (100) according to any of claims 35-40, wherein the second event is configured to be a performance of a key performance indicator of a radio access network.

42. The communications system (100) according to any of claims 35-41, being further configured to:

- determine, by the second node (112), based on first observed data on the second event, an error of prediction of the second event of at least one of the one or more first predictive models configured to be determined, and

- send, by the second node (112), a first indication of the error configured to be determined to the first node (111).

43. A computer program (910), comprising instructions which, when executed on at least one processor (906), cause the at least one processor (906) to carry out the method according to any one of claims 1 to 7.

44. A computer-readable storage medium (911), having stored thereon a computer program (910), comprising instructions which, when executed on at least one processor (906), cause the at least one processor (906) to carry out the method according to any one of claims 1 to 7.

45. A computer program (1009), comprising instructions which, when executed on at least one processor (1005), cause the at least one processor (1005) to carry out the method according to any one of claims 8 to 13.

46. A computer-readable storage medium (1010), having stored thereon a computer program (1009), comprising instructions which, when executed on at least one processor (1005), cause the at least one processor (1005) to carry out the method according to any one of claims 8 to 13.

47. A computer program (1101), comprising instructions which, when executed on at least one processor (906, 1005), cause the at least one processor (906, 1005) to carry out the method according to any one of claims 14 to 21.

48. A computer-readable storage medium (1102), having stored thereon a computer program (1101), comprising instructions which, when executed on at least one processor

59 (906, 1005), cause the at least one processor (906, 1005) to carry out the method according to any one of claims 14 to 21.

60