WO2020135510A1

WO2020135510A1 - Burst load prediction method and device, storage medium and electronic device

Info

Publication number: WO2020135510A1
Application number: PCT/CN2019/128337
Authority: WO
Inventors: 孟晟
Original assignee: 中兴通讯股份有限公司
Priority date: 2018-12-29
Filing date: 2019-12-25
Publication date: 2020-07-02
Also published as: CN111385128B; CN111385128A

Abstract

Provided by the present disclosure are a burst load prediction method and device, a storage medium and electronic device. The burst load prediction method comprises: collecting load index data of a designated area, the load index data being used to express load conditions of the designated area; and using a burst load model to analyze the load index data and predict the occurrence of a burst load, the burst load model being trained by means of machine learning using a plurality of groups of data.

Description

Method and device for predicting sudden load, storage medium, electronic device

Technical field

This disclosure relates to, but is not limited to, the field of communications.

Background technique

Service networks usually have limited resources, and there are usually several situations where the load of an access node or the entire network approaches or even reaches the upper limit. One is to gradually approach the upper limit of capacity in the medium and long term, one is the periodic load surge in the short and medium term, and the other is the real-time sudden surge. Short-term to long-term load forecasting with granularity above hour is generally based on time series models. The sudden load surge is usually mainly determined by the random behavior of network users, and is only related to the time period close to the time of occurrence. The congestion caused by this is an open problem.

Taking a cellular communication network as an example, when the cell load exceeds a certain level, the service performance of users in the cell will be reduced (such as delays and stalls). In severe cases, congestion will cause the basic index of the system to deteriorate sharply, so that user services cannot proceed normally. Especially in scenarios with high user density, such as sports games, concerts, and large-scale gatherings, how to relieve or even avoid real-time congestion is a pain point for operators and users.

There are three main ways to implement the traditional scheme: (1) Regional planning and layout according to the user capacity corresponding to the time of congestion. This will seriously reduce the spectrum utilization rate and greatly increase the network cost in ordinary time periods; (2) Fix the cell parameters in the area to the "maximum number of accesses" to ensure only the basic connection of users. This will sacrifice user rate and service type; (3) In order to balance spectrum utilization and user experience, when scenarios with high user density (or high traffic) occur, manually monitor various network indicators, according to whether the indicators exceed The threshold is set to manually adjust the network parameters. However, this technical solution requires a lot of operation and maintenance personnel to be deployed on site, which greatly increases labor costs.

Summary of the invention

According to an embodiment of the present disclosure, there is also provided a method for predicting a sudden load, including: collecting load index data of a specified area, wherein the load index data is used to characterize the load of the specified area; and The load generation model analyzes the load index data to predict the occurrence of sudden load, wherein the sudden load model is trained by machine learning using multiple sets of data.

According to an embodiment of the present disclosure, there is also provided a device for predicting a sudden load, including: a collection module for collecting load index data of a specified area, wherein the load index data is used to characterize the load of the specified area Situation; and a prediction module for analyzing the load index data using a burst load model to predict the occurrence of the burst load, wherein the burst load model is trained by machine learning using multiple sets of data.

According to an embodiment of the present disclosure, there is also provided a storage medium in which a computer program is stored, wherein, when the computer program is executed by a processor, the processor executes the burst load according to the present disclosure method of prediction.

According to another embodiment of the present disclosure, there is also provided an electronic device including a memory and a processor, the memory stores a computer program, and when the processor runs the computer program, the burst according to the present disclosure is executed Load forecasting method.

BRIEF DESCRIPTION

The drawings described herein are used to provide a further understanding of the present disclosure and form a part of the present disclosure. The exemplary embodiments and descriptions of the present disclosure are used to explain the present disclosure and do not constitute an undue limitation on the present disclosure. In the drawings:

1 is a flowchart of a method for predicting burst load according to an embodiment of the present disclosure;

2 is a structural block diagram of a burst load prediction device according to an embodiment of the present disclosure;

3 is another structural block diagram of a burst load prediction device according to an embodiment of the present disclosure;

4 is a structural block diagram of an electronic device according to an embodiment of the present disclosure;

5 is a system block diagram according to an embodiment of the present disclosure;

6 is an overall flowchart according to an exemplary embodiment of the present disclosure;

7 is a schematic diagram of sequence alignment search according to an embodiment of the present disclosure;

8 is a schematic diagram of boundary parameters according to an embodiment of the present disclosure;

9 is a schematic diagram of screening equivalent evaluation index services according to an embodiment of the present disclosure;

FIG. 10 is a decision tree method for real-time congestion automatic labeling according to an embodiment of the present disclosure.

detailed description

Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings and in conjunction with the embodiments. It should be noted that the embodiments of the present disclosure and the features in the embodiments can be combined with each other without conflict.

It should be noted that the terms “first”, “second”, etc. in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, and do not have to be used to describe a specific order or sequence.

An embodiment of the present disclosure provides a method for predicting a burst load. FIG. 1 is a flowchart of a method for predicting a burst load according to an embodiment of the present disclosure. As shown in FIG. 1, the method for burst load according to an embodiment of the present disclosure The prediction method includes steps S102 to S104.

In step S102, load index data of a specified area is collected, wherein the load index data is used to characterize the load condition of the specified area.

In step S104, the load index data is analyzed using a burst load model to predict the occurrence of burst load, wherein the burst load model is trained by machine learning using multiple sets of data.

Through the above steps, collect the load index data of the specified area, where the load index data is used to characterize the load of the specified area; use the burst load model to analyze the load index data to predict the occurrence of sudden load In the case, the burst load model is trained by machine learning using multiple sets of data. The above technical solution is adopted to solve the real-time congestion caused by the sudden load surge in the related technology. The traditional technical solution has high costs. Problem, and then can deal with the congestion caused by the sudden load surge, reducing labor costs.

It should be noted that the occurrence of the sudden load may refer to the time of occurrence of the sudden load, the duration of the occurrence of the sudden load, etc. and the technical solutions related to the sudden load.

In the embodiment of the present disclosure, before the step of analyzing the load index data using the burst load model, the method further includes: selecting a burst load model suitable for the designated area from a plurality of burst load models .

In an embodiment of the present disclosure, before the step of selecting a burst load model suitable for the specified area from a plurality of burst load models, the method further includes: acquiring historical data collected at the base station; Pre-processing is performed to obtain the entire region cell data set; from the entire region cell data set, a congestion data set is selected according to a specified rule; and the congestion data set is annotated to obtain a plurality of the burst load models.

In the embodiment of the present disclosure, the historical data includes at least one of the following: load indicator data, network key performance indicator data, key service quality indicator data, and user behavior indicator data.

In the embodiment of the present disclosure, the entire area cell data set includes: independent variable data and dependent variable data, and the entire area cell data set is obtained at least in the following manner: obtaining independent variable data and dependent variable data that satisfy preset conditions , Where the independent variable data includes: load type data, and the dependent variable data and the independent variable data have a specified functional relationship.

In the embodiment of the present disclosure, the step of selecting the congestion data set from the entire cell data set of the area according to a specified rule includes: analyzing the dependent variable data and the independent variable data according to the comparison detection method; comparing the number of comparison detections Dependent variable data satisfying a preset number of times is used as congestion indicator data; and n independent variable indicator data whose association degree with the congestion indicator data satisfies a preset value is used as the congestion data set, where n is a positive integer .

More specifically, the above-mentioned congestion data set can be obtained by: determining independent variables (load-type indicators) and dependent variables (perceived cumulative indicators); with “there are customer complaints or O&M personnel clearly mark the number of times that congestion exists, or all The data of the average and standard deviation of the load index in the session are in the top 20%" to learn, first determine the high load of the independent variable, and then find out the dependent variable index with high correlation by comparing the detection methods, and then determine the threshold of the dependent variable (Threshold of perception index); and exclude cells that are generally low-load and have a short perception difference time to obtain a congestion data set.

In an embodiment of the present disclosure, after the step of taking n independent variable index data whose degree of association with the congestion indicator data satisfies a preset value as the congestion data set, the method further includes: using gradient to improve decision The tree GBDT method labels the second load index data to obtain a labeling result, where the labeling result includes: a first burst level and a second burst level.

It should be noted that the above-mentioned burst load model includes at least one of the following information: input data length (or observation time window, that is, the burst load model is matched based on the data of this period of time, and the specific model is obtained after matching); advance Quantity; burst load width (or burst load duration); burst load level. The burst load level includes at least: the first burst level and the second burst level.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the technical solution of the present disclosure can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, and optical disk) and includes several instructions to make a terminal The device (which may be a mobile phone, computer, server, or network device, etc.) executes the methods described in the various embodiments of the present disclosure.

The embodiments of the present disclosure also provide a device for predicting burst load. The device is used to implement the above-mentioned embodiments and example implementations, and those that have already been described will not be repeated. As used below, the term "module" may implement a combination of software and/or hardware that performs predetermined functions. Although the devices described in the following embodiments are preferably implemented in software, implementation of hardware or a combination of software and hardware is also possible and conceived.

FIG. 2 is a structural block diagram of a burst load prediction apparatus according to an embodiment of the present disclosure. As shown in FIG. 2, the burst load prediction apparatus according to an embodiment of the present disclosure includes an acquisition module 20 and a prediction module 22.

The collection module 20 is used to collect load index data of the specified area, wherein the load index data is used to characterize the load of the specified area.

The prediction module 22 is used to analyze the load index data using a burst load model to predict the occurrence of the burst load, where the burst load model is trained through machine learning using multiple sets of data.

Through the present disclosure, the load index data of the designated area is collected, wherein the load index data is used to characterize the load situation of the designated area; the load index data is analyzed using a sudden load model to predict the occurrence of sudden load In the case, the burst load model is trained by machine learning using multiple sets of data. The above technical solution is adopted to solve the real-time congestion caused by the sudden load surge in the related technology. The traditional technical solution has high costs. Problem, and then can deal with the congestion caused by the sudden load surge, reducing labor costs.

As shown in FIG. 3, in the embodiment of the present disclosure, the device further includes a selection module 24 for selecting a burst load model suitable for the designated area from a plurality of burst load models.

In the embodiment of the present disclosure, the collection module 20 is used to obtain the historical data collected at the base station; pre-process the historical data to obtain the entire area cell data set; from the entire area cell data set, select congestion according to a specified rule A data set; and annotating the congestion data set to obtain a plurality of the burst load models.

In the embodiment of the present disclosure, the data set of all cells in the area includes: independent variable data and dependent variable data. The collection module 20 is used to obtain independent variable data and dependent variable data that satisfy preset conditions, wherein the independent variable data It includes: load data, and the dependent variable data and the independent variable data have a specified functional relationship.

In the embodiment of the present disclosure, the selection module 24 is further configured to analyze the dependent variable data and the independent variable data according to the comparison detection method; use the dependent variable data whose comparison detection times satisfy a preset number of times as the congestion indicator data; And using n independent variable index data whose degree of association with the congestion indicator data satisfies a preset value as the congestion data set.

In the embodiment of the present disclosure, after using n independent variable index data whose correlation degree with the congestion indicator data satisfies a preset value as the congestion data set, the selection module 24 is also used for the gradient boosting decision tree GBDT method Marking the second load index data to obtain a marking result, wherein the marking result includes: a first burst level and a second burst level.

By adopting the above technical solutions, the following technical problems are overcome, that is, the areas with high congestion occurrence, the characteristics of the congestion identification and the main causes of congestion cannot be automatically identified; the congestion prediction depends on predicting the real-time traffic value, and the real-time traffic prediction is strongly related to the random behavior of the user and is difficult to accurately predict Or the cost is huge; there is no way to accurately label the data.

An embodiment of the present disclosure also provides an electronic device. As shown in FIG. 4, it includes a memory 40 and a processor 42. The memory stores a computer program. When the processor 42 runs the computer program, the The burst load prediction method of the embodiment is disclosed.

It should be noted that the technical solutions of the foregoing embodiments may be used in combination, or may be used alone, which is not limited in the embodiments of the present disclosure.

The above technical solutions are described below in conjunction with exemplary embodiments, but are not used to limit the technical solutions of the embodiments of the present disclosure.

The technical solution of the exemplary embodiment is mainly applied to the structural scenario as shown in FIG. 5, which takes the Long Term Evolved (LTE for short) system as an example, OMC stands for Network Management, and Evolved Node B , Referred to as eNB) means base station, UE means user equipment (mobile phone, tablet, etc.). In addition, Figure 5 also shows: Edge Computing Center (Edge Computing Unit, referred to as ECU), which is connected to all eNBs, can exist alone or can be used by existing eNB boards, and is responsible for real-time calculation processing in the venue, for example Online reasoning, real-time counter collection, audio and video auxiliary information recognition; and external data source collection unit (External Source Unit, referred to as ESU), such as cameras, drones, temperature and humidity sensors, etc. If these external data can be obtained, the performance of the venue security algorithm can be improved. The ESU unit is optional and does not affect the core functions.

FIG. 6 is an overall flowchart according to an exemplary embodiment of the present disclosure. As shown in FIG. 6, the flow includes the following steps S402 to S420.

In step S402, historical data is collected and integrated. The granularity of the real-time data (from the eNB counter) is 5 to 15 seconds, which can be 10 seconds in the exemplary embodiment of the present disclosure.

Collect data or data sets containing load and capacity indicators, and related key system performance indicators to generate evaluation criteria. Data sources include: network management performance statistics, alarm data, measurements, counter data reported by the base station in real time, counter data, and unstructured data (log files, on-site monitoring pictures and videos, etc.). In this disclosure, the final load/capacity used for quantization processing and the key performance indicators (Key Performance Indicator, KPI for short), key service quality indicators (Key Quality Indicator, KQI for short), and user behavior indicator (UBI) belong to the structure For numerical data, it is necessary to extract load/capacity related fields from unstructured data and convert them into structured forms. The independent sub-steps involved (there is no sequence requirement between them) include: collecting wireless front-end historical data; and collecting non-communication system data, such as stand monitoring video during the game (for real-time identification of audience mobile phone operation density).

In step S404, the data health check and the data protocol.

Perform operations on historical raw data fields, remove anomalies, completions, conversions, merges, splits, derivations, and other operations to generate "Feature" fields suitable for quantitative mining evaluation systems. This step includes both the internal data of the system and the data of the ESU. The two can be parallel and have no sequential relationship. After the completion of this step, the pre-processed venue cell data set {CellDataOrigin} is output.

The embodiments of the present disclosure mainly use the following internal data of the system: the number of user connections, the utilization rate of physical resource blocks, the number of traffic bytes, the number of pairs of neighboring cells, the transmission delay on the wireless side, the MCS level indication, and the core customer focus on KPI.

External data (optional) is used in combination with internal data to improve the accuracy of burst load or congestion determination. Audio and video are mainly used to extract information about the moment of excitement of the audience and the moment of high-density mobile phone operation.

In step S406, a method for adaptively identifying congestion scenarios and determining a congestion indicator.

The input data set is the entire cell data set {CellDataOrigin} output in step S404. The core idea is described as: a sudden large number of user connections/data services appear. Due to the limitation of the acceptance/switching capabilities of the system equipment, certain corresponding network KPIs will deteriorate after a certain delay. Therefore, a causal relationship is formed. The independent variables are load indicators (such as the number of connections, the number of traffic bytes, the hardware load rate, etc.), and the dependent variable is the KPI of some systems or networks. In this step, the sequence dependent detection method is used to determine the corresponding dependent variable under the premise that the independent variable is known. At the same time, the confidence range of the delay time is determined according to the information contained in the historical data.

Congestion manifests itself as, for example, users cannot access/receive data or the transmission and reception rate is very slow.

Sudden load can only be regarded as a sudden load that causes a decrease in customer experience. You can use data on the wireless side, for example, if there are customer complaints or the operation and maintenance personnel clearly mark the sessions where there is a congestion period, or the data of the average and standard deviation of the load index in the entire session are in the top 20%. In addition, sequence comparison detection methods can also be used.

If the venue (equivalent to the designated area in the above embodiment) mainly covers the admission control, load balancing strategy algorithm and threshold of the corresponding version of the eNB, if there are significant differences, it needs to be divided into different categories for processing.

The load index is used as an independent variable group to perform normalized discretization. The discretization of records higher than 80 quantiles is 1, and the rest are 0.

The indicators of time delay and signal-to-interference-noise ratio are used as dependent variable groups. The statistical mean and standard deviation are discretized to 1 (mean + 3*standard deviation), and the rest are 0.

Align the time axis and compare the independent variable group with the dependent variable group. Count the number of times the dependent variable is delayed from the independent variable by 1 to 2 time granularities and continues to exceed 2 granularities.

Two dependent variable indicators with the highest number of comparison detections are selected as wireless side congestion indicator. According to 4), two independent variable load indexes with the highest differential correlation with the congestion indicator are selected as the congestion load indexes.

Count the congestion indicators of all congestion sessions, and define its (mean+2*standard deviation) as the congestion recognition threshold TH_CONGESTION.

Judgment of all historical data congestion scenarios. If the load of a community in a certain venue and a game is very low (for example, there are few spectators), then the data of that venue needs to be excluded from the training set.

Optionally, the number of radio resource control (Radio Resource Control, RRC for short) connections, physical resource block (Physical Resource Block, PRB for short) utilization, and the average value of byte traffic are all lower than a preset threshold. Different operators may have different requirements for this.

According to the threshold TH_CONGESTION of the congestion indicator determined above, the threshold of the congestion indicator of the entire field is scanned. If the total congestion indicator threshold is less than 60 seconds, it is determined that there are no congestion sessions, and then removed from the training data.

Obtain and save historical data of all congested sessions {CellDataCongestion}.

In step S408, the historical data is adaptively labeled. The usage data set is {CellDataCongestion}. Perform this step separately for venues where the software and hardware versions of the base station are basically the same.

Scan and mark the congestion indicators of all cells in the data set according to TH_CONGESTION.

Record the congestion time period marked by "Collection of historical data on the wireless side foreground", and perform interval scanning within the granularity of the search interval for the first-order difference sequence of the congestion load index. Only when the load index is in an upward trend can it be judged as a sudden load. For the meaning of the search interval, refer to the description of the delay time in FIG. 8 and step S406. The search range of the search interval is determined based on the statistical information of historical data, for example, a 95% probability falls within 5 to 35 seconds.

The time TA at which the congestion load index starts to rise is determined based on "collection of non-communication system data".

The congestion period is divided and sorted according to the scene, business significance and historical data statistical characteristics. For example, in Figure 7, each grid represents a time granularity, where dark periods with "C" represent congestion periods. When the congestion interval between two segments is 1 time granularity, the granularity is marked as congestion, that is, two segments of congestion separated by only one time granularity are considered to be the same segment; when the congestion interval between two segments is 2 or more time granularities, it is considered to be two Segment independent congestion. This sub-step can make the subsequent data labeling stage clearer and make the congestion recognition rate higher.

In step S410, the optimal model parameter combination is adaptively searched.

The optimal boundary conditions for model training need to be determined by adaptive search. The optimal boundary combination is used to approximate the upper limit of the congestion prediction recognition rate contained in the data set. The basic principle is that the upper limit of the congestion recognition rate is determined by the information contained in the data set itself, and the congestion recognition rate = the model function under the boundary conditions (input feature matrix).

This step uses a simple approximation method to search for the upper limit of the congestion recognition rate and the corresponding optimal parameter set {boundary condition 1, boundary condition 2, ..., boundary condition n}.

As shown in Figure 8, there are four boundary conditions for the burst load prediction model: the number of input data records (time dimension boundary), the amount of burst load advancement, burst load width, and burst load level.

By performing a poor search or optimization on these four boundary parameters, a combination of boundary parameters that is most suitable for the current service (the optimal effect of the service's burst load or congestion recognition is equivalent to the highest accuracy of congestion prediction) is found.

In step S412, the real-time congestion recognition model training.

The input data is the <completed label> data obtained in step S408, and the boundary parameter is the optimal parameter combination obtained in step S410. If there is no step S406-step S410, the situation faced in this step can be understood as a complex network with many super-parameters, the calculation amount is huge and it is difficult to find the optimal/approximate optimal congestion recognition rate. Steps S406-S410 are equivalent to separating relatively independent hyperparameter subspaces from the data space (steps S406-step S408), and individually seeking solutions (step S410). In this way, this step only needs to face a relatively simple data space, which is convenient for solution and model training; for example, in Example 1, the problem faced by this step has been decomposed into available linear support vector machines (Support Vector Machine, referred to as SVM for short) ) A simple model for solving.

Optionally, in a cross-validation manner, the training set and the verification set are divided, and model evaluation and optimal model decision are performed. The criteria for the sudden load forecasting model may include, for example, the minimum structured risk; correspondingly, the model hyperparameter mainly considers the convergence conditions and penalty parameters to improve the generalization ability.

Through model evaluation, the best load prediction model suitable for the current area and current business is selected from the candidate models as the application model for real-time calculation.

In step S414, the real-time congestion recognition model is released.

Model training and online model application can be performed at the same computing node or at different nodes. Therefore, the model training and the model application are separated in terms of logic functions. The model trained in step S412 is saved in a general or special format, and published/delivered to the online application node (ECU).

In step S416, the real-time congestion recognition model is applied online.

Put the obtained model into online practical application and run it in ECU.

In step S418, online performance monitoring and evaluation of real-time congestion recognition.

The misjudgment rate and missed judgment rate of the online model for sudden load are detected, and at the same time, the other on-site indicators are integrated to judge whether the model is the cause, the user behavior pattern is abrupt or other reasons. Generally, the sudden load forecast will also run simultaneously with other safeguards, as a reference basis for network dynamic parameter configuration.

In step S420, maintain/close/recalculate.

According to the evaluation result of step S418, it is determined whether the current prediction and the pre-optimization strategy are maintained (good in effect) or closed (failed). Burst load forecasting is a real-time strategy, and it needs to regularly incorporate the latest data to retrain and evaluate the model.

During, for example, sports competitions, concerts, and large-scale gatherings, there is a situation where the cell load suddenly increases and the user rate drops or even the connection is interrupted. To ensure the quality of traffic, a solution is to dynamically adjust the parameters of the execution node based on the sudden load prediction, and take measures to avoid or slow down before the load arrives.

This exemplary embodiment can be understood as a further detailed technical solution of the foregoing exemplary embodiment, which includes the following steps.

Step 1. Real-time data collection of regional history.

The network unified operation and maintenance management center issues the collection task to the strategy centralized control node. The strategy centralized control node collects the data reported by the eNB at a specified time and sends it to the network unified operation and maintenance management center in an agreed manner. The data includes three categories: load indicators, cell service quality evaluation indicators, and base station hardware resource occupancy rate, with a granularity of 5 to 20 seconds. The time granularity of this embodiment is 10 seconds.

Step 2. Data health check and data specification. This step is performed automatically according to rules (which can be combined with expert interfaces).

In the network unified operation and maintenance management center. Due to events such as temporary failure of the module, congestion of the data transmission link, communication failure, and decoding error, the collected cell-level data may have several health problems that need to be checked and preprocessed according to the rule base. Then, according to the feature generation rules, the features used for subsequent calculations are generated from the original data fields.

Step 3. Discover the principles that define sudden loads.

In this embodiment, it is performed in the network unified operation and maintenance management center. The business objective is defined as solving or alleviating the severe stuck of user data business. There are multiple data fields representing node load, and the stuttering phenomenon is also related to multiple network KPI/KQI. Therefore, it is necessary to find the load index and network KPI/KQI that are most closely related to the business objectives.

Combined with the experience of on-site maintenance and optimization personnel, the equivalent evaluation index that most likely reflects the low stall/download is generated. After being aligned with the timing of each load index, you can use, for example, the correlation algorithm to detect the sudden increase in load index and the change in equivalent evaluation index Consistency. For example, the change trends of the number of connections, PRB utilization rate, and the number of flow bytes are aligned with the beginning and end of the trend, and they can all be classified as independent variables.

After removing the load indicators that are not significant in the business, perform collinear analysis on the remaining load indicators, such as hierarchical clustering and correlation coefficients. If the business manifestation is general and collinear with the strong business manifestation load index, these load indexes are removed. In this embodiment, the load type indicators include (but are not limited to): total traffic, data bytes, maximum RRC connections, average RRC connections, board CPU usage, board memory usage, PRB usage Wait. After the final analysis and comparison, only the RRC maximum number of connections and the downstream PRB utilization rate need to be used as independent variables, that is, there are only two principal components.

During the sudden increase of the main load index or the continuous peak period, the consistency of the change of the equivalent evaluation index is checked. As shown in Figure 9, the deterioration index with a low equivalent stuttering/download rate is used as the dependent variable, and the rising spike needs to be delayed from the independent variable rising spike of the load type.

In this embodiment, in the operator's demand area, the load index that is finally used to define the burst load after data analysis is as follows: the following line PRB utilization is mainly used, and the number of RRC connections is supplemented (only when the number of RRC connections is high ), the corresponding cell equivalent evaluation indicators include: downlink PDCP packet average normalized delay, downlink QPSK coding ratio.

Step 4: Automatically label the historical data with sudden load.

In the network unified operation and maintenance management center. In step 3, some of the "burst load" data that can be identified as causing cell users to feel stuck are marked and divided into burst level 1 and burst level 2. In this embodiment, the thresholds in FIG. 10 are gradually determined using, for example, the GBDT method in combination with the existing threshold, the existing guarantee strategy process, and human experience. In Figure 10, DDR represents the relative ratio of downlink PDCP average delay, DPR represents the utilization rate of downlink Prb, and RUR represents the proportion of RRC connected users. DDR corresponds to the dependent variable indicating congestion, and DPR and RUR correspond to the load type independent variables.

The finally obtained indivisible data parts are: cells with too low number of users, cells with unreasonable CA policies, and cells with overloaded eNB boards.

Step 5. Adaptive search for optimal model parameter combination.

In this embodiment, it is performed in the network unified operation and maintenance management center. The effectiveness of burst load prediction is determined by the combination of the following time window lengths: the number of input data records (time length); how long before the marked burst load is predicted (advance); how long the marked burst load lasts; The burst load level has been specified in step 4 in conjunction with the service specification and is no longer used as a parameter here.

For historical data, use, for example, the classification accuracy of the SVM multi-classification algorithm to evaluate the effect of the combination of parameters. The available methods are decision tree and neural network. The data collection granularity of the input classifier is 10 seconds interval, the time span search range is 10 seconds to 100 seconds, the advance search range is 10 seconds to 60 seconds, and the burst load duration search range is 10 seconds to 200 seconds to collect the input data. The granularity is the step size (10 seconds in this embodiment), and the burst load has a relative grade range of 30% to 60%.

For the current system equipment and current area, the optimal time window combination is:

{The input time span is 40 seconds, the advance is 10 seconds, and the burst load duration is 60 seconds.}

Step 6. Offline training of the burst load prediction model.

Select the data during a historical competition in a gymnasium, and use the parameter combination obtained in step 5 and the principle of minimum structural risk for offline model training. The algorithm selects Support Vector Classifier (SVC for short), which mainly considers the limitation of single-board computing capability of the existing network base station, and the online model must consume less resources. Using the data for the past month, the input data includes the load and equivalent evaluation index data of the cell and the main neighboring cell (from step 3).

The resulting model has Precision about 0.8, Recall about 0.7, and F1-Measure about 0.75.

Step 7. The model is delivered.

Before the start of the next game in the stadium, the network unified operation and maintenance management center will deliver the model trained in step 6 to the strategy centralized control node.

Step 8. Online load forecasting and pre-optimization are performed.

In this embodiment, the real-time calculation of the online model and the corresponding adjustment strategy decision when the sudden load is predicted are all performed at the policy centralized control node, to avoid consuming the computing power of the eNB. The real-time adjustment strategy decided by the centralized control node of the strategy is immediately issued to the eNB for execution, and at the same time, the detailed information of the misjudgment and missed judgment of the sudden load prediction is monitored and recorded.

In this embodiment, subsequent processing measures include, for example, centralized load balancing and automatic adjustment of high traffic parameters.

Step 9. Evaluation and subsequent processing.

In this embodiment, the evaluation criteria include: regional spectrum efficiency, average delay, and user complaint rate (operator index). According to the evaluation criteria, the effectiveness of the sudden load forecast and the subsequent supporting measures are comprehensively judged, and the hyperparameters of the optimal model training are adjusted accordingly.

In summary, with the technical solution of the present disclosure, the beneficial effects achieved are as follows: flexible architecture deployment, less resource consumption, taking into account the hardware capabilities of the existing system and the next generation network; the results obtained during the exploration of data and business rules The intermediate result can be used to support other services under the same system architecture; it can be used as an overall framework for intelligent operation and maintenance, and the program can be used as an implementation subset. The disclosed solution can coexist with other intelligent methods, share architecture, and be jointly optimized.

An embodiment of the present disclosure also provides a storage medium that stores a computer program, which when executed by the processor, causes the processor to execute the burst load prediction method according to various embodiments of the present disclosure.

Optionally, in this embodiment, when the computer program is run by the processor, the processor may be caused to perform the step of: collecting load index data of a specified area, where the load index data is used to characterize the load of the specified area Situation; and use the burst load model to analyze the load index data and predict the occurrence of burst load, wherein the burst load model is trained by machine learning using multiple sets of data.

Optionally, in this embodiment, the above storage medium may include, but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), Various media that can store program codes, such as removable hard disks, magnetic disks, or optical disks.

Optionally, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementation manners, and details are not repeated in this embodiment.

Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present disclosure can be implemented by a general-purpose computing device, they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices Above, optionally, they can be implemented with program code executable by the computing device, so that they can be stored in the storage device to be executed by the computing device, and in some cases, can be in a different order than here The steps shown or described are performed, or they are made into individual integrated circuit modules respectively, or multiple modules or steps among them are made into a single integrated circuit module to achieve. In this way, the present disclosure is not limited to any specific combination of hardware and software.

The above are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principles of this disclosure shall be included in the protection scope of this disclosure.

Claims

A sudden load forecasting method, including:

Collecting load index data of a specified area, wherein the load index data is used to characterize the load of the specified area; and

The load index data is analyzed using a burst load model to predict the occurrence of burst load, wherein the burst load model is trained through machine learning using multiple sets of data.
The method according to claim 1, wherein before the step of analyzing the load index data using the burst load model, the method further comprises:

Select a burst load model suitable for the designated area from a plurality of burst load models.
The method according to claim 2, wherein before the step of selecting a burst load model suitable for the designated area from a plurality of burst load models, the method further comprises:

Obtain historical data collected at the base station;

Pre-process the historical data to obtain a data set of all cells in the area;

Selecting a congestion data set from all cell data sets in the area according to a specified rule; and

Annotate the congestion data set to obtain multiple burst load models.
The method according to claim 3, wherein the historical data includes at least one of the following: load indicator data, network key performance indicator data, key service quality indicator data, and user behavior indicator data.
The method according to claim 3, wherein the regional total cell data set includes: independent variable data and dependent variable data, and the regional total cell data set is acquired at least in the following manner:

Obtain independent variable data and dependent variable data that meet preset conditions,

Wherein, the independent variable data includes: load type data, and the dependent variable data and the independent variable data have a specified functional relationship.
The method according to claim 5, wherein the step of selecting the congestion data set from the entire cell data set of the area according to a specified rule includes:

Analyze the dependent variable data and the independent variable data according to the comparison detection method;

Use the dependent variable data whose comparison detection times meet the preset times as the congestion indicator data; and

Take n independent variable index data whose degree of association with the congestion indicator data satisfy a preset value as the congestion data set, where n is a positive integer.
The method according to any one of claims 1-6, wherein the burst load model includes at least one of the following information: input data length, burst load advance, burst load width, and burst load level.
A sudden load prediction device, including:

A collection module, configured to collect load index data of a specified area, wherein the load index data is used to characterize the load of the specified area; and

The prediction module is used to analyze the load index data using a burst load model to predict the occurrence of the burst load, wherein the burst load model is trained by machine learning using multiple sets of data.
The apparatus according to claim 8, further comprising:

The selection module is used to select a burst load model suitable for the specified area from a plurality of burst load models.
A storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the processor is caused to execute the burst load prediction method described in any one of claims 1 to 7. .
An electronic device including a memory and a processor, wherein a computer program is stored in the memory, and when the processor runs the computer program, the burst described in any one of claims 1 to 7 is executed Load forecasting method.