CN111431955B - Streaming data processing system and method - Google Patents
Streaming data processing system and method Download PDFInfo
- Publication number
- CN111431955B CN111431955B CN201910023008.2A CN201910023008A CN111431955B CN 111431955 B CN111431955 B CN 111431955B CN 201910023008 A CN201910023008 A CN 201910023008A CN 111431955 B CN111431955 B CN 111431955B
- Authority
- CN
- China
- Prior art keywords
- data
- platform
- information
- scheduling
- pod
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0896—Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention provides a streaming data processing system and a method, wherein the system comprises: the leading system is used for receiving the data, forwarding the data to the Kafka platform and sending a notice to the scheduling system; the Kafka platform is used for storing the data forwarded by the leading system in pre-allocated topic; the scheduling system is used for generating a scheduling notification according to the information of the Kafka platform and the state information in the etcd system and sending the scheduling notification to the Kubernetes platform after receiving the notification of the leading system; the etcd system is used for updating and storing the state information according to the report of the Kubernets platform; the Kubernetes platform is used to control, generate or adjust the pod according to the scheduling notification and consume the data in the corresponding topic. The invention realizes a uniform API interface for receiving data stream, and the dispatching system can be pertinently started, closed, expanded and contracted according to the load condition of the container, thereby realizing elastic and flexible data processing.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a streaming data processing system and method.
Background
Kubernets, as an opening source container orchestration engine, is used for managing containerized applications on multiple hosts in a cloud platform, and provides mechanisms for application deployment, planning, updating, and maintenance, with the goal of making it simple and efficient to deploy containerized applications. In a Kubernetes-based system, a client sends a Deployment to an ApiServer, and the ApiServer receives a request of a client and stores resource contents into a database (etcd); a Controller component (comprising scheduler, replication and endpoint) monitors resource changes and reacts; the replicase checks database changes and creates a desired number of pod instances; the Scheduler checks the database change again, finds the Pod which is not distributed to a specific execution node (node), then distributes the Pod to the nodes which can operate the Pod according to a group of related rules, updates the database and records the Pod distribution condition; kubelete monitors database changes, manages the lifecycle of subsequent pods, and discovers those pods that are assigned to run on the node where it is located; if a new pod is found, then this new pod will be run on the node; the kuberproxy runs on each host of the cluster and manages network communication, such as service discovery and load balancing. E.g., when data is sent to the host, it is routed to the correct pod or container. For data sent from a host, it may discover the remote server based on the request address and route the data correctly, in some cases using a Round-robin scheduling algorithm (Round-robin) to send the request to multiple instances in the cluster.
However, the Kubernetes-based system described above suffers from the following drawbacks: the pressure at the rear end cannot be dynamically sensed and the elastic expansion and contraction are realized; no API interface accesses the data stream.
Disclosure of Invention
The invention aims to solve at least one of the technical problems in the prior art or the related art and provides an elastic, flexible and efficient data processing scheme.
To this end, according to a first aspect of the present invention, there is provided a streaming data processing system, characterized by comprising:
the leading system is used for receiving the data, forwarding the data to the Kafka platform and sending a notice to the scheduling system;
the Kafka platform is used for storing the data forwarded by the leading system in pre-allocated topic;
the scheduling system is used for generating a scheduling notice according to the information of the Kafka platform and the state information in the distributed service registration (etcd) system and sending the scheduling notice to the Kubernetes platform after receiving the notice of the approach system;
a distributed service registration (etcd) system for updating and storing the state information according to the report of the Kubernets platform;
and the Kubernetes platform is used for controlling, generating or adjusting the pod according to the scheduling notification and consuming the data in the corresponding topic.
Further, the connection system comprises an nginx module and a lua module, wherein the nginx module is used for receiving the data, and the lua module is used for analyzing address information of the data and forwarding the data to a topic corresponding to the Kafka platform according to the address information.
Further, the state information in the etcd system includes kubernets platform running state information, a service process ID, and a Pod ID, the information of the Kafka platform includes topoic information and a statistical count, and the scheduling notification includes data in the topoc indicating which Pod consumes the corresponding scheduling notification.
Further, the status information in the etcd system further includes a load status, the information of the Kafka platform further includes a received data amount, and the scheduling notification further includes a resource indicating to generate a new pod or to indicate which pod to adjust and how to adjust.
Further, the data is transmitted by adopting a udp protocol in the forwarding and consumption of the data.
According to a second aspect of the present invention, there is provided a streaming data processing method, comprising:
the leading system receives the data, forwards the data to the Kafka platform and sends a notice to the dispatching system;
the Kafka platform stores the data forwarded by the leading system in pre-allocated topic;
after receiving the notification of the approach system, the scheduling system acquires the information of the Kafka platform and the state information in a distributed service registration (etcd) system, generates a scheduling notification and sends the scheduling notification to the Kubernetes platform;
and the Kubernetes platform controls, generates or adjusts the pod according to the scheduling notification and consumes the data in the corresponding topic.
Further, the leading system comprises an nginx module and a lua module, wherein the nginx module receives the data, and the lua module analyzes address information of the data and forwards the data to the topic corresponding to the Kafka platform according to the address information.
Further, the state information in the etcd system includes kubernets platform running state information, a service process ID, and a Pod ID, the information of the Kafka platform includes topoic information and a statistical count, and the scheduling notification includes data in the topoc indicating which Pod consumes the corresponding scheduling notification.
Further, the status information in the etcd system further includes a load status, the information of the Kafka platform further includes a received data amount, and the scheduling notification further includes a resource indicating to generate a new pod or to indicate which pod to adjust and how to adjust.
Further, the data is transmitted by adopting a udp protocol in the forwarding and consumption of the data.
The invention achieves the function of data parallel access by adding an independent process (leading system), and realizes a uniform API interface for receiving data stream; data are classified and forwarded through a nginx + lua module architecture, and dynamic load balancing is achieved; the state update of the Kubernetes platform is received through the etcd system, and the rear-end pressure can be sensed in time; the container is controlled and adjusted by the scheduling system according to the data change condition in the Kafka platform and the container state in the Kubernetes platform, so that the container can be pertinently started, closed, expanded and contracted according to the load condition of the container, and elastic and flexible data processing is realized.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a block diagram of a streaming data processing system according to the present invention;
fig. 2 is a flow chart of a streaming data processing method according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
The invention is improved on the basis of the existing Kubernetes platform, and a data access program at the front end is added to access data flow and carry out classification processing, thereby realizing uniform API interface and dynamic load balance; and a scheduling system is added to realize the elastic expansion and contraction of the container in the Kubernetes platform. The following describes the aspects of the present invention in detail with reference to the accompanying drawings.
Referring to fig. 1, there is shown a streaming data processing system according to the present invention, comprising a docking system 11, a Kafka platform 12, a scheduling system 13, a distributed service registration (etcd) system 14, and a kubernets platform 15. The leading system 11 is configured to receive the data, forward the data to the Kafka platform 12, and send a notification to the scheduling system; the Kafka platform 12 is used for storing the data forwarded by the leading system 11 in pre-allocated topic; the scheduling system 13 is configured to generate a scheduling notification according to the information of the Kafka platform 12 and the state information in the etcd system 14 after receiving the notification of the connection system 11, and send the scheduling notification to the kubernets platform 15; the etcd system 14 is configured to update and store the state information according to the report of the kubernets platform 15; the kubernets platform 15 is used to control, generate or adjust the pod according to the scheduling notification, and consume the data in the corresponding topic. Wherein the pod comprises one or more containers for processing the business data; the "consumption" refers to that the pod acquires and processes the data in the corresponding topic.
Optionally, the guidance system 11 includes a nginx module 111 and a lua module 112, where the nginx module 111 is configured to receive the data, and the lua module 112 is configured to analyze header information of the data, classify the data according to IP address information in the header, and forward the data to a topic corresponding to the Kafka platform according to a predefined routing policy. Thus, when different types of data streams are accessed, data of different ip sources can be classified and distributed.
Optionally, the state information in the etcd system 14 includes kubernets platform operation state information, a service process ID, and a Pod ID, the information of the Kafka platform 12 includes topic information and a statistical count, and the scheduling system 13 generates a scheduling notification according to the information, where the notification includes data in the topic indicating which Pod consumes the corresponding Pod, so that the kubernets platform 15 may control the indicated Pod to consume the data in the topic.
Optionally, the state information in the etcd system 14 further includes a load state, the information of the Kafka platform 12 further includes a received data volume, and the scheduling system 13 generates a scheduling notification according to the information, where the notification includes a resource indicating to generate a new pod or to indicate which pod is to be adjusted and how to adjust, so that the kubernets platform 15 may generate a new pod to consume data in the corresponding topic, or close, expand, and shrink the resource of the indicated pod, for example, a CPU, a memory, and a storage space for the pod, so that resource isolation, dynamic elastic scaling, and migration of each application are ensured.
The information related to the kubernets platform 15 will be uploaded to the etcd system 14 periodically or aperiodically, and the status information stored in the etcd system 14 will be updated in real time.
Optionally, the udp protocol is used for transmitting data in the forwarding and consumption of the data, so that the data transmission concurrency is improved, and the transmission capability of 10000 pieces of data per second of each node can be achieved.
Referring to fig. 2, there is shown a streaming data processing method according to the present invention, comprising:
s21, the leading system receives the data, forwards the data to a Kafka platform and sends a notice to a scheduling system;
s22, the Kafka platform stores the data forwarded by the leading system in pre-distributed topic;
s23, the scheduling system acquires the information of the Kafka platform and the state information in the etcd system, generates a scheduling notice and sends the scheduling notice to the Kubernetes platform;
and S24, the Kubernetes platform controls, generates or adjusts the pod according to the scheduling notification, and consumes the data in the corresponding topic.
Optionally, the guidance system includes a nginx module and a lua module, the nginx module receives the data, and the lua module analyzes address information of the data and forwards the data to a topic corresponding to the Kafka platform according to the address information.
Optionally, the state information in the etcd system includes kubernets platform operation state information, a service process ID, and a Pod ID, the information of the Kafka platform includes topic information and a statistical count, and the scheduling notification includes a notification indicating which Pod consumes data in the corresponding topic.
Optionally, the status information in the etcd system further includes a load status, the information of the Kafka platform further includes a received data volume, and the scheduling notification further includes a resource indicating which pod is to be generated or adjusted, and how to adjust.
Optionally, the data is transmitted by using udp protocol in the forwarding and consumption of the data.
Therefore, the invention realizes a uniform API for receiving data stream, and realizes dynamic load balance by classifying and forwarding data; the rear-end pressure can be sensed in time through the arrangement of the etcd system, the scheduling system can be started, closed, expanded and contracted according to the load condition of the container in a targeted manner, and elastic and flexible data processing is realized.
The following illustrates the specific implementation of the present invention by way of an example:
the collector sends streaming data;
the nginx module in the leading-in system receives the data and sends the data to the lua module, the lua module analyzes message header information of the data, IP address information of the data is obtained, topic corresponding to the data of the IP address is determined according to a predefined routing strategy, a Kafka client API is called to forward the data to topic corresponding to a Kafka platform, and a dispatching system is informed;
after receiving the notification, the scheduling system acquires topic information, statistical count and received data volume from the Kafka platform, acquires state information of the Kubernets platform registered in the etcd from the etcd system, wherein the state information comprises running state information of the Kubernets platform, service process ID, pod ID and load state, and generates a scheduling notification according to the information;
if no container is determined to be processed according to the topoc information, the business process ID and the pod ID, indicating generation of a new pod for processing data in the corresponding topoc in the scheduling notification;
if the container resource is determined to be insufficient according to the load state and the received data volume, indicating to expand the resource of the pod in the scheduling notification, for example, increasing the memory of the pod;
if the container resource is determined to be excessive according to the load state and the received data volume, the resource of the shrinkage pod is indicated in the scheduling notification, for example, the CPU occupancy rate of the pod is reduced;
and the Kubernets platform controls, generates or adjusts the pod according to the scheduling notification, and acquires and processes the corresponding service data in the topic through the specified pod.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium. The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A streaming data processing system, comprising:
the leading system is used for receiving the data, forwarding the data to the Kafka platform and sending a notice to the scheduling system;
the Kafka platform is used for storing the data forwarded by the leading system in pre-allocated topic;
the scheduling system is used for generating a scheduling notice according to the information of the Kafka platform and the state information in the distributed service registration (etcd) system and sending the scheduling notice to the Kubernetes platform after receiving the notice of the approach system;
a distributed service registration (etcd) system, which is used for updating and storing the state information according to the report of a Kubernets platform;
and the Kubernetes platform is used for controlling, generating or adjusting the pod according to the scheduling notification and consuming the data in the corresponding topic.
2. The streaming data processing system according to claim 1, wherein the docking system comprises a nginx module and a lua module, the nginx module is configured to receive the data, and the lua module is configured to analyze address information of the data and forward the data to a topic corresponding to the Kafka platform according to the address information.
3. The streaming data processing system of claim 1, wherein the state information in the etcd system includes kubernets platform running state information, a service process ID, and a podID, wherein the Kafka platform information includes topic information and a statistical count, and wherein the scheduling notification includes an indication of which pod consumed the data in the corresponding topic.
4. The streaming data processing system of claim 3, wherein the state information in the etcd system further comprises a load state, wherein the Kafka platform information further comprises a received data volume, and wherein the scheduling notification further comprises a resource indicating which pod to generate a new pod or which pod to adjust and how to adjust.
5. The streaming data processing system of any of claims 1-4, wherein the data is transmitted using udp protocol in the forwarding and consumption of the data.
6. A streaming data processing method, comprising:
the leading system receives the data, forwards the data to the Kafka platform and sends a notice to the dispatching system;
the Kafka platform stores the data forwarded by the leading system in pre-allocated topic;
after receiving the notification of the approach system, the scheduling system acquires the information of the Kafka platform and the state information in a distributed service registration (etcd) system, generates a scheduling notification and sends the scheduling notification to the Kubernetes platform;
and the Kubernetes platform controls, generates or adjusts the pod according to the scheduling notification and consumes the data in the corresponding topic.
7. The method according to claim 6, wherein the docking system comprises a nginx module and a lua module, the nginx module receives the data, and the lua module analyzes address information of the data and forwards the data to a topic corresponding to the Kafka platform according to the address information.
8. The method of claim 6, wherein the status information in the etcd system comprises Kubernets platform running status information, service process IDs and podIDs, wherein the Kafka platform information comprises topoic information and statistical counts, and wherein the scheduling notification comprises information indicating which pod consumed the data in the corresponding topoc.
9. The method of claim 8, wherein the status information in the etcd system further comprises a load status, wherein the Kafka platform information further comprises a received data volume, and wherein the scheduling notification further comprises a resource indicating which pod is to be adjusted or a new pod is to be generated.
10. Method according to any of claims 6-9, wherein the data is transmitted using udp protocol in the forwarding and consumption of the data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910023008.2A CN111431955B (en) | 2019-01-10 | 2019-01-10 | Streaming data processing system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910023008.2A CN111431955B (en) | 2019-01-10 | 2019-01-10 | Streaming data processing system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111431955A CN111431955A (en) | 2020-07-17 |
CN111431955B true CN111431955B (en) | 2023-03-24 |
Family
ID=71546067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910023008.2A Active CN111431955B (en) | 2019-01-10 | 2019-01-10 | Streaming data processing system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111431955B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113065785B (en) * | 2021-04-13 | 2024-02-20 | 国网江苏省电力有限公司信息通信分公司 | Dynamic resource expansion method for electric power Internet of things management platform |
CN114048108A (en) * | 2022-01-12 | 2022-02-15 | 中科星图智慧科技有限公司 | Automatic treatment method and device for multi-source heterogeneous data |
CN114553866B (en) * | 2022-01-19 | 2024-09-17 | 深圳力维智联技术有限公司 | Full data access method and device and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105897946A (en) * | 2016-04-08 | 2016-08-24 | 北京搜狐新媒体信息技术有限公司 | Obtaining method and system of access address |
CN106888254A (en) * | 2017-01-20 | 2017-06-23 | 华南理工大学 | A kind of exchange method between container cloud framework based on Kubernetes and its each module |
CN108304267A (en) * | 2018-01-31 | 2018-07-20 | 中科边缘智慧信息科技(苏州)有限公司 | The multi-source data of highly reliable low-resource expense draws the method for connecing |
CN108769100A (en) * | 2018-04-03 | 2018-11-06 | 郑州云海信息技术有限公司 | A kind of implementation method and its device based on kubernetes number of containers elastic telescopics |
CN109151464A (en) * | 2018-11-14 | 2019-01-04 | 江苏鸿信系统集成有限公司 | IPTV set top box failure real-time detection method based on high amount of traffic processing |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9239997B2 (en) * | 2013-03-15 | 2016-01-19 | The United States Of America As Represented By The Secretary Of The Navy | Remote environmental and condition monitoring system |
-
2019
- 2019-01-10 CN CN201910023008.2A patent/CN111431955B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105897946A (en) * | 2016-04-08 | 2016-08-24 | 北京搜狐新媒体信息技术有限公司 | Obtaining method and system of access address |
CN106888254A (en) * | 2017-01-20 | 2017-06-23 | 华南理工大学 | A kind of exchange method between container cloud framework based on Kubernetes and its each module |
CN108304267A (en) * | 2018-01-31 | 2018-07-20 | 中科边缘智慧信息科技(苏州)有限公司 | The multi-source data of highly reliable low-resource expense draws the method for connecing |
CN108769100A (en) * | 2018-04-03 | 2018-11-06 | 郑州云海信息技术有限公司 | A kind of implementation method and its device based on kubernetes number of containers elastic telescopics |
CN109151464A (en) * | 2018-11-14 | 2019-01-04 | 江苏鸿信系统集成有限公司 | IPTV set top box failure real-time detection method based on high amount of traffic processing |
Also Published As
Publication number | Publication date |
---|---|
CN111431955A (en) | 2020-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9154382B2 (en) | Information processing system | |
US11336580B2 (en) | Methods, apparatuses and computer program products for transmitting data | |
JP6592595B2 (en) | Method and system for managing data traffic in a computing network | |
US8082364B1 (en) | Managing state information in a computing environment | |
CN101370035B (en) | Method and system for dynamic client/server network management using proxy servers | |
TWI345397B (en) | Method and system for stale data detection based quality of service | |
US7630379B2 (en) | Systems and methods for improved network based content inspection | |
CN102763380B (en) | For the system and method for routing packets | |
CN102771094B (en) | Distributed routing framework | |
EP2163072B1 (en) | Content-based routing | |
CN111431955B (en) | Streaming data processing system and method | |
US20030229674A1 (en) | Internet scaling in a PUB/SUB ENS | |
GB2439195A (en) | Self-managed distributed mediation networks | |
US12126529B2 (en) | Multi-tier deterministic networking | |
CN108123878B (en) | Routing method, routing device and data forwarding equipment | |
CN105376292A (en) | Explicit strategy feedback in name-based forwarding | |
JP6256343B2 (en) | Network, network node, distribution method, and network node program | |
CN115665162A (en) | Intelligent shunting engine for gray scale release | |
Kamiyama et al. | Cache replacement based on distance to origin servers | |
US8289872B2 (en) | System and method for assigning information categories to multicast groups | |
US7574525B2 (en) | System and method for managing communication between server nodes contained within a clustered environment | |
CN103442257A (en) | Method, device and system for achieving flow resource management | |
US8233480B2 (en) | Network clustering | |
CN111970149A (en) | Shared bandwidth realizing method based on hardware firewall QOS | |
Sedaghat et al. | R2T-DSDN: reliable real-time distributed controller-based SDN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |