[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114996022A - Multi-channel available big data real-time decision making system - Google Patents

Multi-channel available big data real-time decision making system Download PDF

Info

Publication number
CN114996022A
CN114996022A CN202210838728.6A CN202210838728A CN114996022A CN 114996022 A CN114996022 A CN 114996022A CN 202210838728 A CN202210838728 A CN 202210838728A CN 114996022 A CN114996022 A CN 114996022A
Authority
CN
China
Prior art keywords
data
time
reciprocating
reciprocating time
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210838728.6A
Other languages
Chinese (zh)
Other versions
CN114996022B (en
Inventor
华俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Huamei Far East Technology Co ltd
Original Assignee
Zhejiang Chuhai Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Chuhai Digital Technology Co ltd filed Critical Zhejiang Chuhai Digital Technology Co ltd
Priority to CN202210838728.6A priority Critical patent/CN114996022B/en
Publication of CN114996022A publication Critical patent/CN114996022A/en
Application granted granted Critical
Publication of CN114996022B publication Critical patent/CN114996022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a multichannel available big data real-time decision making system. The system utilizes a node characteristic information extraction module to extract the reciprocating time required by each service node in each Docker cluster to process data and the balance of the cluster. And the data processing module performs dimension reduction on the reciprocating time data, combines the dimension reduction data and the balance data, and obtains reconstructed reciprocating time characteristic data by phase space reconstruction. And the network generation module trains a reciprocating time characteristic prediction neural network by using the reconstructed reciprocating time characteristic data. The real-time decision module predicts the feature data of the prediction reciprocating time at the future moment based on the real-time data, identifies an abnormal cluster according to the difference between the feature data of the prediction reciprocating time, and reduces the data distribution amount of the abnormal cluster. The method realizes the decision of data quantity distribution by a decision system with high availability and high real-time performance under multiple channels, and prevents the occurrence of abnormity in the real-time analysis process of big data.

Description

Multi-channel available big data real-time decision making system
Technical Field
The invention relates to the technical field of data processing, in particular to a multichannel available big data real-time decision making system.
Background
At present, aiming at high concurrency and high availability, a common mode is to increase the cluster scale of an application container engine (Docker) to process data requests in a distributed manner, and when the Docker cluster encounters a performance bottleneck, a large uncontrollable time delay from data processing to disk dropping may occur, so that the data is asynchronous. For the problem of data asynchronism, the problem can be solved through algorithms such as naive Byzantine and the like, but the performance and the availability of a Docker cluster are still influenced.
Performance bottlenecks may occasionally occur among the Docker containers, because the operating environment of the bare metal server is unknown, the CPU may fall back to a normal frequency after being automatically over-clocked, the CPU/GPU may be over-temperature protected, temporary performance jitter may be caused by data patrol in the IO performance of the disk array, or IOPS may be unstable due to SAN-based storage, and these factors may cause that when one bare metal server operates a Docker cluster, after each service node receives multi-channel data, an instance corresponding to each channel may not be accurately calculated in time. For tasks such as data analysis of a sensor network, high-frequency transaction and the like which need to process high real-time, high computing intensity, high concurrency and long time, the influence of the jitter problem on the quality of service of big data is not negligible. Therefore, a general and transparent management method for determining the load sharing of the running data with multiple channels, high availability and high real-time performance is lacked.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a multichannel real-time big data decision system, which adopts the following technical solutions:
the invention provides a multichannel available big data real-time decision system, which comprises:
the node characteristic information extraction module is used for acquiring reciprocating time required by each service node in each Docker cluster for processing data, and acquiring a reciprocating time sequence corresponding to the Docker cluster; taking the variance of the reciprocating time series as the data processing balance of the Docker cluster;
the data processing module is used for reducing the dimension of the reciprocating time sequence data at continuous moments according to a preset expected dimension to obtain dimension reduction data; merging the dimensionality reduction data and the equality into new reciprocating time characteristic data; performing phase space reconstruction on the reciprocating time characteristic data to obtain reconstructed reciprocating time characteristic data;
the network generation module is used for training the reciprocating time characteristic prediction neural network according to a large amount of reconstructed reciprocating time characteristic data serving as training data;
the real-time decision module is used for acquiring real-time reconstruction reciprocating time characteristic data of each Docker cluster, inputting the real-time reconstruction reciprocating time characteristic data into the reciprocating time characteristic prediction neural network and acquiring prediction reciprocating time characteristic data at a future moment; obtaining affinity of each Docker cluster according to the difference of the predicted reciprocating time characteristic data of each Docker cluster and other Docker clusters; taking the Docker cluster with the affinity smaller than a preset affinity threshold value as an abnormal cluster at the future moment; reducing an amount of data allocation to the exception cluster.
Further, the reducing the dimension of the reciprocating time sequence according to a preset expected dimension, and obtaining dimension-reduced data includes:
and reducing the dimension of the reciprocating time sequence by using a kernel principal component analysis method of the RBF kernel function.
Further, the performing phase space reconstruction on the reciprocating time characteristic data to obtain reconstructed reciprocating time characteristic data includes:
and solving the optimal delay time and the embedding dimension of the reciprocating time characteristic data by using an improved C-C method, and reconstructing the reciprocating time characteristic data into a multi-dimensional phase space by using a delay coordinate method according to the optimal delay time and the embedding dimension to obtain the reconstructed reciprocating time characteristic data.
Furthermore, the reciprocating time characteristic prediction neural network adopts an LSTM neural network structure, a BPTT back propagation algorithm is used for training, and a mean square error function is adopted as a loss function.
Further, the obtaining affinity of each Docker cluster according to the difference between the predicted reciprocation time feature data of each Docker cluster and other Docker clusters includes:
obtaining cosine distances between the predicted reciprocating time characteristic data of the target Docker cluster and the other Docker clusters; taking an inverse of an average cosine distance of the target Docker cluster as the affinity of the target Docker cluster.
The invention has the following beneficial effects:
in the embodiment of the invention, corresponding reciprocating time data are collected for each service node in each Docker cluster at the same sampling time. The cluster is evaluated for equilibrium based on the variance of the round-trip time series. Equalization may be used to indicate whether jitter is occurring within the current Docker cluster. And further considering the characteristics that the data collected at continuous time is difficult to calculate and the data samples are sparse due to numerous service nodes, reducing the dimension of the high-dimensional reciprocating time sequence information according to the preset expected dimension, and obtaining the dimension reduction data convenient to calculate. And combining the dimension reduction data and the balance, and then performing phase space reconstruction to obtain reconstructed reciprocating time characteristic data. The data volume of the reciprocating time feature data is simple, and the feature representativeness is strong, so that the method can be used for subsequent network training. The trained reciprocating time characteristic prediction neural network can be used for performing prediction analysis on a plurality of clusters at real time, judging abnormal clusters according to prediction data, controlling distribution data of the abnormal clusters, and realizing high-availability and high-real-time data quantity distribution decision under multiple channels.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a block diagram of a multi-channel available big data real-time decision system according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description of the embodiments, structures, features and effects of a multi-channel real-time big data decision system according to the present invention will be made with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of a multichannel available big data real-time decision system provided by the invention in detail with reference to the accompanying drawings.
The embodiment of the invention provides a multi-channel big data application scene, which is a service type for continuously processing high real-time and uninterrupted data, such as computation-intensive and IO-intensive big data computation tasks of data analysis, high-frequency transaction and the like of a large sensor network.
Referring to fig. 1, a block diagram of a multi-channel real-time big data decision making system available according to an embodiment of the present invention is shown, where the system includes: the system comprises a node characteristic information extraction module, a data processing module, a network generation module and a real-time decision module.
In modern cloud computing and big data analysis practice, a Docker cluster is generally constructed based on a bare metal server, and all containers in the bare metal server are managed through k8s (kubernets), so that a large number of different types of nodes are formed in the Docker cluster, and all nodes run on a kernel layer of the bare metal server.
In the embodiment of the present invention, the number of nodes and the node type configured by k8s are: 12 service nodes, 1 local remote dictionary service (redis) node and 1 deep neural network inference node. Among the node types, the service node is the main node type for processing data, and therefore, information of a plurality of service nodes needs to be analyzed in subsequent data processing.
Since the general configuration of k8s is not completely parallel, for example, 12 service nodes are parallel in the embodiment of the present invention; and a deep neural network reasoning node, taking the container of OpenVINO as an example, only one node needs to be operated, and the inside can dynamically manage the reasoning request of the deep neural network, so that higher throughput is ensured. Therefore, uncontrollable resource preemption may occur on the memory, the CPU cache and other IO resources among the nodes, and the RTT of the 12 service nodes is jittered due to the influence of performance jitter of the hardware bottom layer of the bare metal server. Thereby affecting the data computation capability of the corresponding service node. It should be noted that, although the bare metal servers are configured in a higher manner in a conventional scenario and frequent jitter is not caused in a normal production environment, in a scenario of a plurality of bare metal servers, when a plurality of bare metal servers share a core network, the bare metal servers are more easily affected by network jitter, so that the database servers cannot be accessed on time, for example, a common SAN scheme in a database read-write scheme of large data, and when the bare metal servers share the SAN storage scheme, a dynamic load carried by the SAN also affects service delay.
In order to represent whether jitter influence occurs or not, the node characteristic information extraction module is used for collecting Round Trip Time (RTT) required by each service node in each Docker cluster to process data. a) RTT generally refers to round trip time in network communication, and for a multi-channel request, the round trip time is the time when a service node currently processing multi-channel data returns a result after calculation.
In the embodiment of the present invention, in view of the difference in modality and data characteristics of multi-channel data, it is assumed that the request round trip time of a service is generally controlled to be about 3 to 10ms, and therefore, when processing multi-channel data for each service node, the average round trip time of each service node is counted at a frequency of 100 Hz.
Because random resource preemption exists among all nodes, the reciprocating time of all service nodes has certain difference. If the Docker cluster enters hardware overload protection due to load distribution errors or other abnormal conditions of a Docker cluster legal system, complex differences exist among the reciprocating time data of a plurality of corresponding service nodes, and therefore the variance of the reciprocating time sequence formed by the service nodes is used as the data processing balance of the Docker cluster. That is, in the embodiment of the present invention, 12 pieces of round trip time data and one piece of corresponding equalization data exist at one sampling time, and the 12 pieces of round trip time data and the equalization data are generated synchronously.
Each time corresponds to a reciprocating time sequence and a balance, so that in continuous acquisition time, the acquired reciprocating time data is high-dimensional and high-data-volume data, and the acquired high-dimensional data may have the problems of sparse data samples, difficult subsequent processing, easy occurrence of series of problems such as overfitting and the like. Therefore, dimension reduction processing needs to be performed on the acquired reciprocating time sequence, and in order to reduce subsequent calculation amount and not affect calculation accuracy, clear logical association is needed between dimension reduced data after dimension reduction and high-dimensional data, so that the data processing module performs dimension reduction on the reciprocating time sequence by using a kernel principal component analysis method of an RBF kernel function.
In the embodiment of the present invention, the preset desired dimension is set to 3 dimensions. In a specific implementation process, the expected dimension needs to be adjusted according to the number of service nodes in the Docker cluster, which is not limited herein.
It should be noted that the process of performing low-dimensional transformation by using the kernel principal component analysis method of the RBF kernel function is the prior art disclosed in the art, and only the process is briefly described here:
(1) the kernel matrix is calculated for the high-dimensional reciprocating time series: and calculating the eigenvalue and the eigenvector of the kernel matrix, performing descending order arrangement on the eigenvalue, and taking the first 3 eigenvalues in the eigenvalue sequence and the corresponding eigenvector.
(2) For 12 dimensional RTT index samples
Figure DEST_PATH_IMAGE001
Wherein
Figure DEST_PATH_IMAGE002
To
Figure DEST_PATH_IMAGE003
Reciprocating at different times of a certain dimensionAnd obtaining a twelve-dimensional reciprocating time data matrix
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
Representing different sampling instants for a certain dimension, and 12 for 12 dimensions.
(3) Computing RBF kernel matrices
Figure DEST_PATH_IMAGE006
And is centralized as
Figure DEST_PATH_IMAGE007
. The method specifically comprises the following steps:
Figure DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
in the formula
Figure DEST_PATH_IMAGE010
Is an i x i matrix with elements of 1/i,
Figure DEST_PATH_IMAGE011
is a nuclear parameter;
(4) computing
Figure 859356DEST_PATH_IMAGE007
The feature value of (2) kernel feature vector, will
Figure 838813DEST_PATH_IMAGE007
Characteristic value of
Figure DEST_PATH_IMAGE012
Performing descending order, and taking the first E characteristic values of the characteristic value sequence
Figure DEST_PATH_IMAGE013
And corresponding feature vectors
Figure DEST_PATH_IMAGE014
(5) Calculating the dimension reduction result by using the kernel matrix and the eigenvalue and eigenvector selected in the above process
Figure DEST_PATH_IMAGE015
The dimension reduction data obtained by the data processing module can simplify the subsequent processing time, reduce the hardware requirement required in the subsequent network production module, and can highlight the reciprocating time characteristic when the resource is preempted.
The dimension reduction data and the balance data both contain reciprocating time characteristic data of a Docker cluster, and as hardware resource preemption is a chaotic system, namely an analysis algorithm is complex and multi-modal, in order to enable a neural network in a subsequent network generation module to accurately predict the reciprocating time characteristic at a future moment, further characteristic analysis needs to be carried out on the new reciprocating time characteristic data combined by the dimension reduction data and the balance.
In the embodiment of the invention, three-dimensional dimensionality reduction data and corresponding balance data jointly form four-dimensional reciprocating time characteristic data which are recorded as
Figure DEST_PATH_IMAGE016
And the data processing module performs phase space reconstruction on the reciprocating time characteristic data to obtain reconstructed reciprocating time characteristic data. The method specifically comprises the following steps:
and solving the optimal delay time and the embedding dimension of the reciprocating time characteristic data by using an improved C-C method, and reconstructing the reciprocating time characteristic data into a multi-dimensional phase space by using a delay coordinate method according to the optimal delay time and the embedding dimension to obtain reconstructed reciprocating time characteristic data. It should be noted that the phase space reconstruction method in this section is the prior art disclosed in the art, and the process thereof is briefly described here:
(1) monitoring time sequences for RTT
Figure DEST_PATH_IMAGE017
Defining the correlation integral of the embedding time series as:
Figure DEST_PATH_IMAGE018
wherein,
Figure DEST_PATH_IMAGE019
in order to embed the dimension number of the dimension,
Figure DEST_PATH_IMAGE020
in order to be the best time for the user,
Figure 419484DEST_PATH_IMAGE005
is a serial number of a time point,
Figure DEST_PATH_IMAGE021
in order to define the radius of the space,
Figure DEST_PATH_IMAGE022
in the form of a step function,
Figure DEST_PATH_IMAGE023
Figure DEST_PATH_IMAGE024
two point vectors in phase space are reconstructed for the reciprocating temporal feature data.
(2) Constructing test statistics:
Figure DEST_PATH_IMAGE025
1. computationally, using a block-averaging strategy, and let i tend to be positive infinite:
Figure DEST_PATH_IMAGE026
2. selection pairTwo space radii corresponding to the maximum and minimum of the reciprocation time characteristic data
Figure DEST_PATH_IMAGE027
]There is no necessary size relationship between the two radii, define
Figure DEST_PATH_IMAGE028
And
Figure DEST_PATH_IMAGE029
in the same way
Figure 267224DEST_PATH_IMAGE019
And
Figure 347175DEST_PATH_IMAGE020
lower pair
Figure 497534DEST_PATH_IMAGE021
The amount of change is respectively
Figure DEST_PATH_IMAGE030
Figure DEST_PATH_IMAGE031
Figure DEST_PATH_IMAGE032
3. Obtained according to BDS statistical theorem
Figure DEST_PATH_IMAGE033
Reasonable estimation of (2) is taken in the embodiments of the present invention
Figure DEST_PATH_IMAGE034
Figure DEST_PATH_IMAGE035
Figure DEST_PATH_IMAGE036
Figure 325419DEST_PATH_IMAGE037
Standard deviation for time series;
Figure DEST_PATH_IMAGE038
=1,2,3;
4. and (3) calculating:
Figure DEST_PATH_IMAGE039
5. comparing on the basis of corresponding test statistics
Figure DEST_PATH_IMAGE040
And
Figure 570324DEST_PATH_IMAGE029
in step 1, fixing
Figure DEST_PATH_IMAGE041
When i tends to be positive and infinite,
Figure 629416DEST_PATH_IMAGE029
will follow
Figure 511528DEST_PATH_IMAGE020
Is increased to show an ever increasing high frequency fluctuation, while under the same conditions, overall
Figure 328174DEST_PATH_IMAGE040
And
Figure 127503DEST_PATH_IMAGE029
has the same fluctuation law, but removes
Figure 978785DEST_PATH_IMAGE029
By selecting
Figure DEST_PATH_IMAGE042
As the optimum delay
Figure 503569DEST_PATH_IMAGE020
(ii) a In addition, the RTT monitoring time sequence with the pseudo period of T is fixed
Figure DEST_PATH_IMAGE043
When the temperature of the liquid crystal tends to be positive infinity,
Figure DEST_PATH_IMAGE044
that is to
Figure DEST_PATH_IMAGE045
The local maximum point of (a) is again
Figure DEST_PATH_IMAGE046
C is an integer greater than zero, thus
Figure DEST_PATH_IMAGE047
The local peak with obvious period point is found
Figure 509309DEST_PATH_IMAGE047
The period point is used as an optimal embedding window 1; by the formula
Figure DEST_PATH_IMAGE048
Obtaining an embedding dimension m;
6. by finding
Figure DEST_PATH_IMAGE049
Using a delay coordinate method to convert the initial four dimensions
Figure 938760DEST_PATH_IMAGE016
Reconstructing RTT monitoring data into an m-dimensional phase space, wherein the matrix sequence expression is as follows:
Figure DEST_PATH_IMAGE050
where M is the number of delay vectors,
Figure DEST_PATH_IMAGE051
Figure DEST_PATH_IMAGE052
i.e. the reconstructed reciprocation time characteristic data.
The reconstructed reciprocating time characteristic data has the advantages of simple data form, small data quantity, strong characteristic representativeness and the like, so that in the network generation module, the reconstructed reciprocating time characteristic data can be used as training data to train the reciprocating time characteristic prediction neural network to obtain the trained reciprocating time characteristic prediction neural network.
Preferably, the reciprocating time characteristic prediction neural network adopts a long-term short-term (LSTM) neural network structure, the BPTT back propagation algorithm is used for training, and the loss function adopts a mean square error function.
And the real-time decision module acquires real-time reconstruction reciprocating time characteristic data of each Docker cluster in real time. The real-time reconstruction method comprises the steps of forming a data matrix by reciprocating time data at the current real-time moment and reciprocating time data at the adjacent moment, and obtaining real-time reconstruction reciprocating time characteristic data after utilizing the dimensionality reduction and phase space reconstruction processing. Inputting the real-time reconstructed reciprocating time characteristic data into a reciprocating time characteristic prediction neural network, and obtaining the predicted reciprocating time characteristic data of the future moment. Because the current service data processing environment runs stably for a long time, occasional abnormal features need to be analyzed, so that the requests of multi-channel data are reduced in advance, and the problem of high delay caused by hardware resource preemption is solved.
The real-time decision module obtains affinity of each Docker cluster according to difference of the predicted reciprocating time characteristic data between each Docker cluster and other Docker clusters, and specifically comprises the following steps:
obtaining cosine distance between the target Docker cluster and the predicted reciprocating time characteristic data of each other Docker cluster; and taking the reciprocal of the average cosine distance of the target Docker cluster as the affinity of the target Docker cluster. I.e., greater affinity, indicates that the current Docker cluster is more normal.
And counting the affinity of each Docker cluster, taking the Docker cluster with the affinity smaller than a preset affinity threshold as an abnormal cluster at a future moment, and reducing the data distribution amount of the abnormal cluster.
In the embodiment of the invention, the distribution weight of the NGINX equal load sharing device to the abnormal cluster is controlled to be reduced when the HTTP request is received, the load of the bare metal server corresponding to the abnormal cluster is dynamically lightened, and unreliable hardware resource preemption is prevented.
In summary, in the embodiment of the present invention, the node feature information extraction module is used to extract the reciprocation time required by each service node in each Docker cluster to process data, and the balance of the cluster is obtained according to the variance of the reciprocation time in the cluster. And performing dimension reduction on the reciprocating time data by using a data processing module, combining the dimension reduction data and the balance data, and then obtaining reconstructed reciprocating time characteristic data by using phase space reconstruction. And the network generation module trains a reciprocating time characteristic prediction neural network by using the reconstructed reciprocating time characteristic data. The real-time decision module predicts the feature data of the prediction reciprocating time at the future moment based on the real-time data, identifies an abnormal cluster according to the difference between the feature data of the prediction reciprocating time, and reduces the data distribution amount of the abnormal cluster. The embodiment of the invention realizes the decision of data quantity distribution by a decision system with high availability and high real-time performance under multiple channels, and prevents the occurrence of abnormity in the real-time analysis process of big data.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. The processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A multi-channel available big data real-time decision making system, the system comprising:
the node characteristic information extraction module is used for acquiring reciprocating time required by each service node in each Docker cluster for processing data, and acquiring a reciprocating time sequence corresponding to the Docker cluster; taking the variance of the reciprocating time series as the data processing balance of the Docker cluster;
the data processing module is used for reducing the dimension of the reciprocating time sequence data at continuous moments according to a preset expected dimension to obtain dimension reduction data; merging the dimensionality reduction data and the equality into new reciprocating time characteristic data; performing phase space reconstruction on the reciprocating time characteristic data to obtain reconstructed reciprocating time characteristic data;
the network generation module is used for training the reciprocating time characteristic prediction neural network according to a large amount of reconstructed reciprocating time characteristic data serving as training data;
the real-time decision module is used for acquiring real-time reconstruction reciprocating time characteristic data of each Docker cluster, inputting the real-time reconstruction reciprocating time characteristic data into the reciprocating time characteristic prediction neural network and acquiring prediction reciprocating time characteristic data at a future moment; obtaining affinity of each Docker cluster according to the difference of the predicted reciprocating time characteristic data of each Docker cluster and other Docker clusters; taking the Docker cluster with the affinity smaller than a preset affinity threshold value as an abnormal cluster at the future moment; reducing an amount of data allocation to the exception cluster.
2. The multi-channel available big data real-time decision system according to claim 1, wherein the dimension reduction of the reciprocating time series is performed according to a preset desired dimension, and the obtaining of the dimension reduction data comprises:
and reducing the dimension of the reciprocating time sequence by using a kernel principal component analysis method of the RBF kernel function.
3. The multi-channel available big data real-time decision making system according to claim 1, wherein the performing phase space reconstruction on the reciprocating time feature data to obtain reconstructed reciprocating time feature data comprises:
and solving the optimal delay time and the embedding dimension of the reciprocating time characteristic data by using an improved C-C method, and reconstructing the reciprocating time characteristic data into a multi-dimensional phase space by using a delay coordinate method according to the optimal delay time and the embedding dimension to obtain the reconstructed reciprocating time characteristic data.
4. The multi-channel available big data real-time decision system as claimed in claim 1, wherein the reciprocating time characteristic prediction neural network adopts an LSTM neural network structure, and is trained by using a BPTT back propagation algorithm, and the loss function adopts a mean square error function.
5. The multi-channel available big data real-time decision making system according to claim 1, wherein the obtaining affinity of each Docker cluster according to the difference of the predicted reciprocation time feature data of each Docker cluster and other Docker clusters comprises:
obtaining cosine distances between the predicted reciprocating time characteristic data of the target Docker cluster and the other Docker clusters; taking an inverse of an average cosine distance of the target Docker cluster as the affinity of the target Docker cluster.
CN202210838728.6A 2022-07-18 2022-07-18 Multi-channel available big data real-time decision-making system Active CN114996022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210838728.6A CN114996022B (en) 2022-07-18 2022-07-18 Multi-channel available big data real-time decision-making system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210838728.6A CN114996022B (en) 2022-07-18 2022-07-18 Multi-channel available big data real-time decision-making system

Publications (2)

Publication Number Publication Date
CN114996022A true CN114996022A (en) 2022-09-02
CN114996022B CN114996022B (en) 2024-03-08

Family

ID=83021114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210838728.6A Active CN114996022B (en) 2022-07-18 2022-07-18 Multi-channel available big data real-time decision-making system

Country Status (1)

Country Link
CN (1) CN114996022B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100092470A1 (en) * 2008-09-22 2010-04-15 Icb International, Inc. Antibodies, analogs and uses thereof
WO2013110147A1 (en) * 2011-12-30 2013-08-01 Embrapa - Empresa Brasileira De Pesquisa Agropecuária Computer -aided design of new alpha-amylase inhibitors
CN104184813A (en) * 2014-08-20 2014-12-03 杭州华为数字技术有限公司 Load balancing method of virtual machines, related equipment and trunking system
CN106095639A (en) * 2016-05-30 2016-11-09 中国农业银行股份有限公司 A kind of cluster subhealth state method for early warning and system
WO2017072854A1 (en) * 2015-10-27 2017-05-04 株式会社日立製作所 Monitoring device, monitoring system and monitoring method
CN108123983A (en) * 2016-11-30 2018-06-05 深圳联友科技有限公司 A kind of method for caching and processing and system of affinity load cluster
CN110287233A (en) * 2019-06-18 2019-09-27 华北电力大学 A kind of system exception method for early warning based on deep learning neural network
US20200067969A1 (en) * 2018-08-22 2020-02-27 General Electric Company Situation awareness and dynamic ensemble forecasting of abnormal behavior in cyber-physical system
CN110933139A (en) * 2019-11-05 2020-03-27 浙江工业大学 System and method for solving high concurrency of Web server
CN111914875A (en) * 2020-06-05 2020-11-10 华南理工大学 Fault early warning method of rotating machinery based on Bayesian LSTM model
CN114711784A (en) * 2022-05-11 2022-07-08 遵义医科大学附属医院 Heart transplantation postoperation electrocardiogram abnormity analysis method and system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100092470A1 (en) * 2008-09-22 2010-04-15 Icb International, Inc. Antibodies, analogs and uses thereof
WO2013110147A1 (en) * 2011-12-30 2013-08-01 Embrapa - Empresa Brasileira De Pesquisa Agropecuária Computer -aided design of new alpha-amylase inhibitors
CN104184813A (en) * 2014-08-20 2014-12-03 杭州华为数字技术有限公司 Load balancing method of virtual machines, related equipment and trunking system
WO2017072854A1 (en) * 2015-10-27 2017-05-04 株式会社日立製作所 Monitoring device, monitoring system and monitoring method
CN106095639A (en) * 2016-05-30 2016-11-09 中国农业银行股份有限公司 A kind of cluster subhealth state method for early warning and system
CN108123983A (en) * 2016-11-30 2018-06-05 深圳联友科技有限公司 A kind of method for caching and processing and system of affinity load cluster
US20200067969A1 (en) * 2018-08-22 2020-02-27 General Electric Company Situation awareness and dynamic ensemble forecasting of abnormal behavior in cyber-physical system
CN110287233A (en) * 2019-06-18 2019-09-27 华北电力大学 A kind of system exception method for early warning based on deep learning neural network
CN110933139A (en) * 2019-11-05 2020-03-27 浙江工业大学 System and method for solving high concurrency of Web server
CN111914875A (en) * 2020-06-05 2020-11-10 华南理工大学 Fault early warning method of rotating machinery based on Bayesian LSTM model
CN114711784A (en) * 2022-05-11 2022-07-08 遵义医科大学附属医院 Heart transplantation postoperation electrocardiogram abnormity analysis method and system

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
夏军等: "一种Femtocell网络中的负载预测方法", 《重庆邮电大学学报(自然科学版)》 *
夏军等: "一种Femtocell网络中的负载预测方法", 《重庆邮电大学学报(自然科学版)》, no. 03, 15 June 2019 (2019-06-15) *
张鸿等: "《基于人工智能的多媒体数据挖掘和应用实例》", 31 January 2018, 武汉大学出版社, pages: 62 *
韩敏等: "混沌时间序列分析与预测研究综述", 《信息与控制》 *
韩敏等: "混沌时间序列分析与预测研究综述", 《信息与控制》, no. 01, 15 February 2020 (2020-02-15) *
黄成兵: "一种多层次分布式数据挖掘方法的改进研究", 《现代电子技术》 *
黄成兵: "一种多层次分布式数据挖掘方法的改进研究", 《现代电子技术》, no. 09, 1 May 2017 (2017-05-01) *

Also Published As

Publication number Publication date
CN114996022B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
He et al. QoE-driven content-centric caching with deep reinforcement learning in edge-enabled IoT
Hou et al. Distredge: Speeding up convolutional neural network inference on distributed edge devices
CN111444021A (en) Synchronous training method, server and system based on distributed machine learning
CN112398700B (en) Service degradation method and device, storage medium and computer equipment
CN115309605A (en) Big data based anomaly monitoring method and device
Sudharsan et al. Toward distributed, global, deep learning using iot devices
Qin et al. Federated full-parameter tuning of billion-sized language models with communication cost under 18 kilobytes
Tu et al. An optimized cluster storage method for real-time big data in Internet of Things
CN106201839A (en) The information loading method of a kind of business object and device
Li et al. Parallel skyline queries over uncertain data streams in cloud computing environments
CN109063752B (en) Multi-source high-dimensional multi-scale real-time data stream sorting method based on neural network
Daghistani et al. Swarm: Adaptive load balancing in distributed streaming systems for big spatial data
CN114996022B (en) Multi-channel available big data real-time decision-making system
US8849745B2 (en) Decision support methods and apparatus
CN117081996B (en) Flow control method based on server-side real-time feedback and soft threshold and related equipment
Li et al. H-BILSTM: a novel bidirectional long short term memory network based intelligent early warning scheme in mobile edge computing (MEC)
EP4030290A1 (en) Management computer, management system, and recording medium
Liu et al. Modeling and optimizing the scaling performance in distributed deep learning training
CN115333922A (en) Operation and maintenance support network alarm data mining method, system and storage medium
CN114637809A (en) Method, device, electronic equipment and medium for dynamic configuration of synchronous delay time
CN112488169A (en) PCA-based massive Linux system operation and maintenance data dimension reduction method
Deng et al. TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning
CN115545364A (en) Node evaluation method and device, electronic equipment and readable storage medium
Yang et al. Efficient Edge Data Management Framework for IIoT via Prediction-Based Data Reduction
CN113114661A (en) Cloud-edge collaborative lightweight data processing method for intelligent building Internet of things equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230330

Address after: Room 28, 29 and 30, third floor, podium building, No.2, incubation center, Henan National University Science Park, No.11, Changchun Road, high tech Industrial Development Zone, Zhengzhou City, Henan Province, 450000

Applicant after: Zhengzhou Maitou Information Technology Co.,Ltd.

Address before: 314001 Room 601, building 15, No. 36, Changsheng South Road, Nanhu District, Jiaxing City, Zhejiang Province

Applicant before: ZHEJIANG CHUHAI DIGITAL TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240130

Address after: Room 702, Unit 3, Building 1, No. 140 Qianfeng South Road, Wanbailin District, Taiyuan City, Shanxi Province, 030000

Applicant after: Shanxi Huamei Far East Technology Co.,Ltd.

Country or region after: China

Address before: Room 28, 29 and 30, third floor, podium building, No.2, incubation center, Henan National University Science Park, No.11, Changchun Road, high tech Industrial Development Zone, Zhengzhou City, Henan Province, 450000

Applicant before: Zhengzhou Maitou Information Technology Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant