[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111431955B - Streaming data processing system and method - Google Patents

Streaming data processing system and method Download PDF

Info

Publication number
CN111431955B
CN111431955B CN201910023008.2A CN201910023008A CN111431955B CN 111431955 B CN111431955 B CN 111431955B CN 201910023008 A CN201910023008 A CN 201910023008A CN 111431955 B CN111431955 B CN 111431955B
Authority
CN
China
Prior art keywords
data
platform
information
scheduling
pod
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910023008.2A
Other languages
Chinese (zh)
Other versions
CN111431955A (en
Inventor
毕俊
张敬亮
傅伟康
马银龙
王千一
王晓东
杨光辉
张家旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Star Map Co ltd
Original Assignee
Zhongke Star Map Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Star Map Co ltd filed Critical Zhongke Star Map Co ltd
Priority to CN201910023008.2A priority Critical patent/CN111431955B/en
Publication of CN111431955A publication Critical patent/CN111431955A/en
Application granted granted Critical
Publication of CN111431955B publication Critical patent/CN111431955B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0896Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a streaming data processing system and a method, wherein the system comprises: the leading system is used for receiving the data, forwarding the data to the Kafka platform and sending a notice to the scheduling system; the Kafka platform is used for storing the data forwarded by the leading system in pre-allocated topic; the scheduling system is used for generating a scheduling notification according to the information of the Kafka platform and the state information in the etcd system and sending the scheduling notification to the Kubernetes platform after receiving the notification of the leading system; the etcd system is used for updating and storing the state information according to the report of the Kubernets platform; the Kubernetes platform is used to control, generate or adjust the pod according to the scheduling notification and consume the data in the corresponding topic. The invention realizes a uniform API interface for receiving data stream, and the dispatching system can be pertinently started, closed, expanded and contracted according to the load condition of the container, thereby realizing elastic and flexible data processing.

Description

Streaming data processing system and method
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a streaming data processing system and method.
Background
Kubernets, as an opening source container orchestration engine, is used for managing containerized applications on multiple hosts in a cloud platform, and provides mechanisms for application deployment, planning, updating, and maintenance, with the goal of making it simple and efficient to deploy containerized applications. In a Kubernetes-based system, a client sends a Deployment to an ApiServer, and the ApiServer receives a request of a client and stores resource contents into a database (etcd); a Controller component (comprising scheduler, replication and endpoint) monitors resource changes and reacts; the replicase checks database changes and creates a desired number of pod instances; the Scheduler checks the database change again, finds the Pod which is not distributed to a specific execution node (node), then distributes the Pod to the nodes which can operate the Pod according to a group of related rules, updates the database and records the Pod distribution condition; kubelete monitors database changes, manages the lifecycle of subsequent pods, and discovers those pods that are assigned to run on the node where it is located; if a new pod is found, then this new pod will be run on the node; the kuberproxy runs on each host of the cluster and manages network communication, such as service discovery and load balancing. E.g., when data is sent to the host, it is routed to the correct pod or container. For data sent from a host, it may discover the remote server based on the request address and route the data correctly, in some cases using a Round-robin scheduling algorithm (Round-robin) to send the request to multiple instances in the cluster.
However, the Kubernetes-based system described above suffers from the following drawbacks: the pressure at the rear end cannot be dynamically sensed and the elastic expansion and contraction are realized; no API interface accesses the data stream.
Disclosure of Invention
The invention aims to solve at least one of the technical problems in the prior art or the related art and provides an elastic, flexible and efficient data processing scheme.
To this end, according to a first aspect of the present invention, there is provided a streaming data processing system, characterized by comprising:
the leading system is used for receiving the data, forwarding the data to the Kafka platform and sending a notice to the scheduling system;
the Kafka platform is used for storing the data forwarded by the leading system in pre-allocated topic;
the scheduling system is used for generating a scheduling notice according to the information of the Kafka platform and the state information in the distributed service registration (etcd) system and sending the scheduling notice to the Kubernetes platform after receiving the notice of the approach system;
a distributed service registration (etcd) system for updating and storing the state information according to the report of the Kubernets platform;
and the Kubernetes platform is used for controlling, generating or adjusting the pod according to the scheduling notification and consuming the data in the corresponding topic.
Further, the connection system comprises an nginx module and a lua module, wherein the nginx module is used for receiving the data, and the lua module is used for analyzing address information of the data and forwarding the data to a topic corresponding to the Kafka platform according to the address information.
Further, the state information in the etcd system includes kubernets platform running state information, a service process ID, and a Pod ID, the information of the Kafka platform includes topoic information and a statistical count, and the scheduling notification includes data in the topoc indicating which Pod consumes the corresponding scheduling notification.
Further, the status information in the etcd system further includes a load status, the information of the Kafka platform further includes a received data amount, and the scheduling notification further includes a resource indicating to generate a new pod or to indicate which pod to adjust and how to adjust.
Further, the data is transmitted by adopting a udp protocol in the forwarding and consumption of the data.
According to a second aspect of the present invention, there is provided a streaming data processing method, comprising:
the leading system receives the data, forwards the data to the Kafka platform and sends a notice to the dispatching system;
the Kafka platform stores the data forwarded by the leading system in pre-allocated topic;
after receiving the notification of the approach system, the scheduling system acquires the information of the Kafka platform and the state information in a distributed service registration (etcd) system, generates a scheduling notification and sends the scheduling notification to the Kubernetes platform;
and the Kubernetes platform controls, generates or adjusts the pod according to the scheduling notification and consumes the data in the corresponding topic.
Further, the leading system comprises an nginx module and a lua module, wherein the nginx module receives the data, and the lua module analyzes address information of the data and forwards the data to the topic corresponding to the Kafka platform according to the address information.
Further, the state information in the etcd system includes kubernets platform running state information, a service process ID, and a Pod ID, the information of the Kafka platform includes topoic information and a statistical count, and the scheduling notification includes data in the topoc indicating which Pod consumes the corresponding scheduling notification.
Further, the status information in the etcd system further includes a load status, the information of the Kafka platform further includes a received data amount, and the scheduling notification further includes a resource indicating to generate a new pod or to indicate which pod to adjust and how to adjust.
Further, the data is transmitted by adopting a udp protocol in the forwarding and consumption of the data.
The invention achieves the function of data parallel access by adding an independent process (leading system), and realizes a uniform API interface for receiving data stream; data are classified and forwarded through a nginx + lua module architecture, and dynamic load balancing is achieved; the state update of the Kubernetes platform is received through the etcd system, and the rear-end pressure can be sensed in time; the container is controlled and adjusted by the scheduling system according to the data change condition in the Kafka platform and the container state in the Kubernetes platform, so that the container can be pertinently started, closed, expanded and contracted according to the load condition of the container, and elastic and flexible data processing is realized.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a block diagram of a streaming data processing system according to the present invention;
fig. 2 is a flow chart of a streaming data processing method according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
The invention is improved on the basis of the existing Kubernetes platform, and a data access program at the front end is added to access data flow and carry out classification processing, thereby realizing uniform API interface and dynamic load balance; and a scheduling system is added to realize the elastic expansion and contraction of the container in the Kubernetes platform. The following describes the aspects of the present invention in detail with reference to the accompanying drawings.
Referring to fig. 1, there is shown a streaming data processing system according to the present invention, comprising a docking system 11, a Kafka platform 12, a scheduling system 13, a distributed service registration (etcd) system 14, and a kubernets platform 15. The leading system 11 is configured to receive the data, forward the data to the Kafka platform 12, and send a notification to the scheduling system; the Kafka platform 12 is used for storing the data forwarded by the leading system 11 in pre-allocated topic; the scheduling system 13 is configured to generate a scheduling notification according to the information of the Kafka platform 12 and the state information in the etcd system 14 after receiving the notification of the connection system 11, and send the scheduling notification to the kubernets platform 15; the etcd system 14 is configured to update and store the state information according to the report of the kubernets platform 15; the kubernets platform 15 is used to control, generate or adjust the pod according to the scheduling notification, and consume the data in the corresponding topic. Wherein the pod comprises one or more containers for processing the business data; the "consumption" refers to that the pod acquires and processes the data in the corresponding topic.
Optionally, the guidance system 11 includes a nginx module 111 and a lua module 112, where the nginx module 111 is configured to receive the data, and the lua module 112 is configured to analyze header information of the data, classify the data according to IP address information in the header, and forward the data to a topic corresponding to the Kafka platform according to a predefined routing policy. Thus, when different types of data streams are accessed, data of different ip sources can be classified and distributed.
Optionally, the state information in the etcd system 14 includes kubernets platform operation state information, a service process ID, and a Pod ID, the information of the Kafka platform 12 includes topic information and a statistical count, and the scheduling system 13 generates a scheduling notification according to the information, where the notification includes data in the topic indicating which Pod consumes the corresponding Pod, so that the kubernets platform 15 may control the indicated Pod to consume the data in the topic.
Optionally, the state information in the etcd system 14 further includes a load state, the information of the Kafka platform 12 further includes a received data volume, and the scheduling system 13 generates a scheduling notification according to the information, where the notification includes a resource indicating to generate a new pod or to indicate which pod is to be adjusted and how to adjust, so that the kubernets platform 15 may generate a new pod to consume data in the corresponding topic, or close, expand, and shrink the resource of the indicated pod, for example, a CPU, a memory, and a storage space for the pod, so that resource isolation, dynamic elastic scaling, and migration of each application are ensured.
The information related to the kubernets platform 15 will be uploaded to the etcd system 14 periodically or aperiodically, and the status information stored in the etcd system 14 will be updated in real time.
Optionally, the udp protocol is used for transmitting data in the forwarding and consumption of the data, so that the data transmission concurrency is improved, and the transmission capability of 10000 pieces of data per second of each node can be achieved.
Referring to fig. 2, there is shown a streaming data processing method according to the present invention, comprising:
s21, the leading system receives the data, forwards the data to a Kafka platform and sends a notice to a scheduling system;
s22, the Kafka platform stores the data forwarded by the leading system in pre-distributed topic;
s23, the scheduling system acquires the information of the Kafka platform and the state information in the etcd system, generates a scheduling notice and sends the scheduling notice to the Kubernetes platform;
and S24, the Kubernetes platform controls, generates or adjusts the pod according to the scheduling notification, and consumes the data in the corresponding topic.
Optionally, the guidance system includes a nginx module and a lua module, the nginx module receives the data, and the lua module analyzes address information of the data and forwards the data to a topic corresponding to the Kafka platform according to the address information.
Optionally, the state information in the etcd system includes kubernets platform operation state information, a service process ID, and a Pod ID, the information of the Kafka platform includes topic information and a statistical count, and the scheduling notification includes a notification indicating which Pod consumes data in the corresponding topic.
Optionally, the status information in the etcd system further includes a load status, the information of the Kafka platform further includes a received data volume, and the scheduling notification further includes a resource indicating which pod is to be generated or adjusted, and how to adjust.
Optionally, the data is transmitted by using udp protocol in the forwarding and consumption of the data.
Therefore, the invention realizes a uniform API for receiving data stream, and realizes dynamic load balance by classifying and forwarding data; the rear-end pressure can be sensed in time through the arrangement of the etcd system, the scheduling system can be started, closed, expanded and contracted according to the load condition of the container in a targeted manner, and elastic and flexible data processing is realized.
The following illustrates the specific implementation of the present invention by way of an example:
the collector sends streaming data;
the nginx module in the leading-in system receives the data and sends the data to the lua module, the lua module analyzes message header information of the data, IP address information of the data is obtained, topic corresponding to the data of the IP address is determined according to a predefined routing strategy, a Kafka client API is called to forward the data to topic corresponding to a Kafka platform, and a dispatching system is informed;
after receiving the notification, the scheduling system acquires topic information, statistical count and received data volume from the Kafka platform, acquires state information of the Kubernets platform registered in the etcd from the etcd system, wherein the state information comprises running state information of the Kubernets platform, service process ID, pod ID and load state, and generates a scheduling notification according to the information;
if no container is determined to be processed according to the topoc information, the business process ID and the pod ID, indicating generation of a new pod for processing data in the corresponding topoc in the scheduling notification;
if the container resource is determined to be insufficient according to the load state and the received data volume, indicating to expand the resource of the pod in the scheduling notification, for example, increasing the memory of the pod;
if the container resource is determined to be excessive according to the load state and the received data volume, the resource of the shrinkage pod is indicated in the scheduling notification, for example, the CPU occupancy rate of the pod is reduced;
and the Kubernets platform controls, generates or adjusts the pod according to the scheduling notification, and acquires and processes the corresponding service data in the topic through the specified pod.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium. The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A streaming data processing system, comprising:
the leading system is used for receiving the data, forwarding the data to the Kafka platform and sending a notice to the scheduling system;
the Kafka platform is used for storing the data forwarded by the leading system in pre-allocated topic;
the scheduling system is used for generating a scheduling notice according to the information of the Kafka platform and the state information in the distributed service registration (etcd) system and sending the scheduling notice to the Kubernetes platform after receiving the notice of the approach system;
a distributed service registration (etcd) system, which is used for updating and storing the state information according to the report of a Kubernets platform;
and the Kubernetes platform is used for controlling, generating or adjusting the pod according to the scheduling notification and consuming the data in the corresponding topic.
2. The streaming data processing system according to claim 1, wherein the docking system comprises a nginx module and a lua module, the nginx module is configured to receive the data, and the lua module is configured to analyze address information of the data and forward the data to a topic corresponding to the Kafka platform according to the address information.
3. The streaming data processing system of claim 1, wherein the state information in the etcd system includes kubernets platform running state information, a service process ID, and a podID, wherein the Kafka platform information includes topic information and a statistical count, and wherein the scheduling notification includes an indication of which pod consumed the data in the corresponding topic.
4. The streaming data processing system of claim 3, wherein the state information in the etcd system further comprises a load state, wherein the Kafka platform information further comprises a received data volume, and wherein the scheduling notification further comprises a resource indicating which pod to generate a new pod or which pod to adjust and how to adjust.
5. The streaming data processing system of any of claims 1-4, wherein the data is transmitted using udp protocol in the forwarding and consumption of the data.
6. A streaming data processing method, comprising:
the leading system receives the data, forwards the data to the Kafka platform and sends a notice to the dispatching system;
the Kafka platform stores the data forwarded by the leading system in pre-allocated topic;
after receiving the notification of the approach system, the scheduling system acquires the information of the Kafka platform and the state information in a distributed service registration (etcd) system, generates a scheduling notification and sends the scheduling notification to the Kubernetes platform;
and the Kubernetes platform controls, generates or adjusts the pod according to the scheduling notification and consumes the data in the corresponding topic.
7. The method according to claim 6, wherein the docking system comprises a nginx module and a lua module, the nginx module receives the data, and the lua module analyzes address information of the data and forwards the data to a topic corresponding to the Kafka platform according to the address information.
8. The method of claim 6, wherein the status information in the etcd system comprises Kubernets platform running status information, service process IDs and podIDs, wherein the Kafka platform information comprises topoic information and statistical counts, and wherein the scheduling notification comprises information indicating which pod consumed the data in the corresponding topoc.
9. The method of claim 8, wherein the status information in the etcd system further comprises a load status, wherein the Kafka platform information further comprises a received data volume, and wherein the scheduling notification further comprises a resource indicating which pod is to be adjusted or a new pod is to be generated.
10. Method according to any of claims 6-9, wherein the data is transmitted using udp protocol in the forwarding and consumption of the data.
CN201910023008.2A 2019-01-10 2019-01-10 Streaming data processing system and method Active CN111431955B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910023008.2A CN111431955B (en) 2019-01-10 2019-01-10 Streaming data processing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910023008.2A CN111431955B (en) 2019-01-10 2019-01-10 Streaming data processing system and method

Publications (2)

Publication Number Publication Date
CN111431955A CN111431955A (en) 2020-07-17
CN111431955B true CN111431955B (en) 2023-03-24

Family

ID=71546067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910023008.2A Active CN111431955B (en) 2019-01-10 2019-01-10 Streaming data processing system and method

Country Status (1)

Country Link
CN (1) CN111431955B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065785B (en) * 2021-04-13 2024-02-20 国网江苏省电力有限公司信息通信分公司 Dynamic resource expansion method for electric power Internet of things management platform
CN114048108A (en) * 2022-01-12 2022-02-15 中科星图智慧科技有限公司 Automatic treatment method and device for multi-source heterogeneous data
CN114553866B (en) * 2022-01-19 2024-09-17 深圳力维智联技术有限公司 Full data access method and device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897946A (en) * 2016-04-08 2016-08-24 北京搜狐新媒体信息技术有限公司 Obtaining method and system of access address
CN106888254A (en) * 2017-01-20 2017-06-23 华南理工大学 A kind of exchange method between container cloud framework based on Kubernetes and its each module
CN108304267A (en) * 2018-01-31 2018-07-20 中科边缘智慧信息科技(苏州)有限公司 The multi-source data of highly reliable low-resource expense draws the method for connecing
CN108769100A (en) * 2018-04-03 2018-11-06 郑州云海信息技术有限公司 A kind of implementation method and its device based on kubernetes number of containers elastic telescopics
CN109151464A (en) * 2018-11-14 2019-01-04 江苏鸿信系统集成有限公司 IPTV set top box failure real-time detection method based on high amount of traffic processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9239997B2 (en) * 2013-03-15 2016-01-19 The United States Of America As Represented By The Secretary Of The Navy Remote environmental and condition monitoring system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897946A (en) * 2016-04-08 2016-08-24 北京搜狐新媒体信息技术有限公司 Obtaining method and system of access address
CN106888254A (en) * 2017-01-20 2017-06-23 华南理工大学 A kind of exchange method between container cloud framework based on Kubernetes and its each module
CN108304267A (en) * 2018-01-31 2018-07-20 中科边缘智慧信息科技(苏州)有限公司 The multi-source data of highly reliable low-resource expense draws the method for connecing
CN108769100A (en) * 2018-04-03 2018-11-06 郑州云海信息技术有限公司 A kind of implementation method and its device based on kubernetes number of containers elastic telescopics
CN109151464A (en) * 2018-11-14 2019-01-04 江苏鸿信系统集成有限公司 IPTV set top box failure real-time detection method based on high amount of traffic processing

Also Published As

Publication number Publication date
CN111431955A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
US9154382B2 (en) Information processing system
US11336580B2 (en) Methods, apparatuses and computer program products for transmitting data
JP6592595B2 (en) Method and system for managing data traffic in a computing network
US8082364B1 (en) Managing state information in a computing environment
CN101370035B (en) Method and system for dynamic client/server network management using proxy servers
TWI345397B (en) Method and system for stale data detection based quality of service
US7630379B2 (en) Systems and methods for improved network based content inspection
CN102763380B (en) For the system and method for routing packets
CN102771094B (en) Distributed routing framework
EP2163072B1 (en) Content-based routing
CN111431955B (en) Streaming data processing system and method
US20030229674A1 (en) Internet scaling in a PUB/SUB ENS
GB2439195A (en) Self-managed distributed mediation networks
US12126529B2 (en) Multi-tier deterministic networking
CN108123878B (en) Routing method, routing device and data forwarding equipment
CN105376292A (en) Explicit strategy feedback in name-based forwarding
JP6256343B2 (en) Network, network node, distribution method, and network node program
CN115665162A (en) Intelligent shunting engine for gray scale release
Kamiyama et al. Cache replacement based on distance to origin servers
US8289872B2 (en) System and method for assigning information categories to multicast groups
US7574525B2 (en) System and method for managing communication between server nodes contained within a clustered environment
CN103442257A (en) Method, device and system for achieving flow resource management
US8233480B2 (en) Network clustering
CN111970149A (en) Shared bandwidth realizing method based on hardware firewall QOS
Sedaghat et al. R2T-DSDN: reliable real-time distributed controller-based SDN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant