[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN108197233A - A kind of data managing method, middleware and data management system - Google Patents

A kind of data managing method, middleware and data management system Download PDF

Info

Publication number
CN108197233A
CN108197233A CN201711473196.6A CN201711473196A CN108197233A CN 108197233 A CN108197233 A CN 108197233A CN 201711473196 A CN201711473196 A CN 201711473196A CN 108197233 A CN108197233 A CN 108197233A
Authority
CN
China
Prior art keywords
data
subject information
gathered
middleware
gathered data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711473196.6A
Other languages
Chinese (zh)
Inventor
颜健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feihu Information Technology Tianjin Co Ltd
Original Assignee
Feihu Information Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feihu Information Technology Tianjin Co Ltd filed Critical Feihu Information Technology Tianjin Co Ltd
Priority to CN201711473196.6A priority Critical patent/CN108197233A/en
Publication of CN108197233A publication Critical patent/CN108197233A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application provides a kind of data managing methods,Middleware and data management system,The method and middleware are by obtaining the gathered data of data acquisition side,It generates the subject information of the gathered data and encapsulates the gathered data and the subject information,And it sends and is packaged with the gathered data of subject information to data storage side (so that data store root will be at storage location corresponding to the acquired data storage to corresponding theme according to the subject information of the gathered data),It realizes and the gathered data of data acquisition side is written to data storage side,So as to utilize application scheme,It can be realized by middleware form and the gathered data of the data such as Flume OG acquisition side is written to data storage sides such as Kafka,Solve in the prior art early stage Flume versions because there is no Kafka plug-in units,Caused by can not by the daily record being collected into be written Kafka the problem of.

Description

A kind of data managing method, middleware and data management system
Technical field
The invention belongs to a kind of middleware Technology field more particularly to data managing method, middleware and data management systems System.
Background technology
Flume OG are that a High Availabitity, highly reliable, the distributed massive logs that Cloudera is provided acquire, is poly- Conjunction and Transmission system.
Flume supports to customize Various types of data sender, for collecting data, the data needs collected in log system The storage systems such as Kafka are written, for calculating in real time and data cleansing, but since early stage Flume version such as Flume OG do not have Kafka plug-in units the daily record being collected into can not be written in the storage systems such as Kafka, it is therefore desirable to develop a set of middleware to connect Flume daily records are received, and are written into the storage systems such as Kafka.
Invention content
In view of this, the purpose of the present invention is to provide a kind of data managing method, middleware and data management systems, use In solve the problem of early stage Flume version such as Flume OG do not have Kafka plug-in units can not by the daily record being collected into be written Kafka.
For this purpose, the present invention is disclosed directly below technical solution:
A kind of data managing method, applied to middleware, the method includes:
Obtain the gathered data of data acquisition side;
The subject information of the gathered data is generated, and encapsulates the gathered data and the subject information, including The gathered data of subject information;
The gathered data including subject information is sent to data storage side, so that data store root according to the acquisition The subject information of data will be at the storage location corresponding to the acquired data storage to corresponding theme.
The above method, it is preferred that the data acquisition side is Flume log systems, then the data acquisition side of obtaining Gathered data, including:
Every daily record data of Flume log systems acquisition is received based on predetermined protocol;
Every daily record data of reception is buffered in the obstruction queue being pre-created.
The above method, it is preferred that message system is subscribed to for Kafka distributed posts by the data storage side, then the hair The gathered data including subject information is sent to data storage side, including:
Based on the thread pool being pre-created, daily record data to the Kafka distributed posts that transmission includes subject information are subscribed to Message system.
The above method, it is preferred that before the gathered data to data storage side of subject information is included described in the transmission, It further includes:
Obtain black and white lists subject information;
When the corresponding subject information of the gathered data is blacklist subject information or non-white list subject information, filtering Fall the gathered data;
When the corresponding subject information of the gathered data is non-blacklist subject information or white list subject information, triggering The step of sending at the storage location corresponding to the gathered data to the corresponding theme of data storage side including subject information.
The above method, it is preferred that further include:
The data traffic of the middleware is monitored, and is alerted in data traffic exception.
A kind of middleware, including:
Data capture unit, for obtaining the gathered data of data acquisition side;
Theme generation unit for generating the subject information of the gathered data, and encapsulates the gathered data and described Subject information obtains the gathered data for including subject information;
Data transmission unit, for sending the gathered data including subject information to data storage side, so that data Store root according to the subject information of the gathered data by the acquired data storage to corresponding theme corresponding to storage location Place.
Above-mentioned middleware, it is preferred that the data acquisition side is Flume log systems, and the data storage side is Kafka Distributed post subscribes to message system;
The then data capture unit, is specifically used for:
Every daily record data of Flume log systems acquisition is received based on predetermined protocol;By every daily record data of reception It is buffered in the obstruction queue being pre-created;
Correspondingly, the data transmission unit, is specifically used for:
Based on the thread pool being pre-created, daily record data to the Kafka distributed posts that transmission includes subject information are subscribed to Message system.
Above-mentioned middleware, it is preferred that further include:
Black and white lists administrative unit, is used for:
Obtain black and white lists subject information;When the corresponding subject information of the gathered data is blacklist subject information or non- During white list subject information, the gathered data is filtered out;When the corresponding subject information of the gathered data is non-blacklist master When inscribing information or white list subject information, the transmitting element is triggered.
Above-mentioned middleware, it is preferred that further include:
Traffic monitoring unit for monitoring the data traffic of the middleware, and is alerted in data traffic exception.
A kind of data management system, including server cluster, wherein, each server disposition in the server cluster There are one middlewares as described above.
Above system, it is preferred that further include:
Load balancing managing device, for carrying out load monitoring and pipe to each server in the server cluster Reason so that in the server cluster each server load balancing.
It is described by above scheme it is found that this application provides a kind of data managing method, middleware and data management system Method and middleware are generated by obtaining the gathered data of data acquisition side described in subject information and the encapsulation of the gathered data Gathered data and the subject information and transmission are packaged with the gathered data of subject information to data storage side (so that data Store root according to the subject information of the gathered data by the acquired data storage to corresponding theme corresponding to storage location Place), it realizes and the gathered data of data acquisition side is written to data storage side, it, can be in so as to utilize application scheme Between part form realization the gathered data of the data such as Flume OG acquisition side is written to data storage sides such as Kafka, solve show Have in technology early stage Flume version because there is no Kafka plug-in units, caused by asking for Kafka can not be written in the daily record being collected into Topic.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention, for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of data managing method flow chart provided in an embodiment of the present invention;
Fig. 2 is the circuit theory schematic diagram of Flume log systems;
Fig. 3 is another data managing method flow chart provided in an embodiment of the present invention;
Fig. 4 is another data managing method flow chart provided in an embodiment of the present invention;
Fig. 5 is a kind of middleware structure schematic diagram provided in an embodiment of the present invention;
Fig. 6 is another middleware structure schematic diagram provided in an embodiment of the present invention;
Fig. 7 is another middleware structure schematic diagram provided in an embodiment of the present invention;
Fig. 8 is a kind of structure diagram of data management system provided in an embodiment of the present invention;
Fig. 9 is the structure diagram of another data management system provided in an embodiment of the present invention.
Specific embodiment
For the sake of quoting and understanding, the technical term that hereinafter uses is write a Chinese character in simplified form or summary of abridging is explained as follows:
Flume:Flume is the High Availabitity that Cloudera is provided, and highly reliable, distributed massive logs are adopted The system of collection, polymerization and transmission, Flume supports to customize Various types of data sender in log system, for collecting data;Together When, Flume is provided carries out simple process, and write the ability of various data receivings (customizable) to data.
Kafka:Kafka is that a kind of distributed post of high-throughput subscribes to message system, it can handle consumer's rule Everything flow data in the website of mould.This action (web page browsing, search and the action of other users) is in modern net One key factor of many social functions on network.These data are often as the requirement of handling capacity and by handling daily record It is solved with log aggregation.For the daily record data as Hadoop and off-line analysis system, but require processing in real time Limitation, this is a feasible solution.The purpose of Kafka is come on unified line by the loaded in parallel mechanism of Hadoop With offline Message Processing, also for providing real-time consumption by cluster machine.
Avro:Avro is the system of a Data Serialization.It can be provided:Abundant type of data structure, quickly may be used The binary data form of compression stores the document container of persistant data, remote procedure call, simple dynamic language knot Function is closed, after Avro and dynamic language combine, data file is read and write and generation code is not all needed to using RPC agreements, and code Generation is only worth realizing in static types language as a kind of optional optimization.
LVS:A high-performance is realized using Clustering and (SuSE) Linux OS, the server of High Availabitity has very well Scalability (Scalability), good reliability (Reliability), good manageability (Manageability)。
Zookeeper:ZooKeeper is one distributed, and the distributed application program coordination service of open source code is Mono- realization increased income of Chubby of Google is the significant components of Hadoop and Hbase.It is one and is carried for Distributed Application For the software of Consistency service, the function of providing includes:Configuring maintenance, domain name service, distributed synchronization, group service etc..
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment shall fall within the protection scope of the present invention.
The embodiment of the present application discloses a kind of data managing method first, and this method can be applied in middleware, for solution Certainly early stage Flume versions such as Flume OG are not because having Kafka plug-in units, and lead to not the daily record being collected into write-in Kafka's Problem, the data managing method flow chart with reference to shown in figure 1, this method include:
Step 101, the gathered data for obtaining data acquisition side.
The form that middleware can be used in the data managing method of the present embodiment is realized, for being adopted data by middleware The data write-in data storage side that collection side acquires.
Wherein, the data acquisition side can be but not limited to Flume log systems, such as can be specifically Flume OG.Flume OG be Cloudera provide a High Availabitity, highly reliable, distributed massive logs acquisition, polymerization and Transmission system, as shown in Fig. 2, the logical architecture of Flume officials includes engine (agent), collector (collector) and storage Device (storage), wherein, engine is the place that data flow is generated in flume, and (each application system is such as acquired for gathered data Daily record data etc.), the effect of collector is will to be loaded into memory after the data summarization of multiple engines, and storage is to deposit Storage system, can be an ordinary file (file) or Kafka distributed posts subscribe to message system or HDFS (Hadoop Distributed File System, Hadoop distributed file system) etc..
The data storage side is the storage system, can be ordinary file, Kafka distributions as described above Formula distribution subscription message system or HDFS etc., the present embodiment does not limit to it.
The present embodiment next will be specifically with the data acquisition side for Flume log systems, and data storage side is Kafka Distributed post is illustrated application scheme for subscribing to message system.
When Flume log systems is in the data acquisition side, when the gathered data for obtaining data acquisition side in this step When, the gathered data that is obtained is correspondingly the daily record data of Flume log systems, specifically Flume log systems from The daily record data acquired in various application systems.
Wherein, when implementing application scheme using middleware form, using the Data Transport Protocol made an appointment such as Avro agreements, HTTP (HyperText Transfer Protocol, hypertext transfer protocol) etc. realize daily record data from Flume log systems, can be previously according to Flume API by taking Avro agreements as an example to the data transmission of middleware (Application Programming Interface, application programming interface) is abided by middle unit development one is corresponding The AvroSource interfaces of Avro agreements are followed, and develop Avro services, on this basis, can will develop the AvroSource completed The interface message of interface is configured in Flume, and starts the Avro services in middleware, later, is serviced by the Avro logical It crosses the AvroSource interfaces and daily record data is obtained from Flume log systems with Avro agreements.
Wherein, the daily record data obtained for Avro services in middle unit development and can start a LogHandler in advance (log management) services, and blocks queue by the service-creation one, on this basis, can service reception by LogHandler and come from The daily record data of Avro services, and the daily record data of reception is buffered in the obstruction queue, wait for subsequent processing.
The subject information of step 102, the generation gathered data, and the gathered data and the subject information are encapsulated, Obtain the gathered data for including subject information.
Kafka distributed posts subscribe to message system and often carry out data production (the data production i.e. finger by theme Flume sends the logs to Kafka), and different data channel is correspondingly provided, it enables to based on different data The log information of channel reception/production different themes, in consideration of it, in order to which Flume daily record datas are written Kafka, with The different data channel of Kafka is docked, in the present embodiment, when middleware obtains every daily record number of Flume log systems According to rear, the subject information (topic) of this daily record data is generated, and the subject information of generation and this daily record data are encapsulated as One.
Wherein, the subject information of daily record data can be the theme divided according to the classification belonging to daily record data, such as army Thing, amusement, education etc.;Or can also be the theme divided according to the source of daily record data, such as application system 1, using system System 2 etc., the present embodiment does not limit to it, and in practical application, the log topic generated in middleware should meet Kafka The used theme dividing mode when data storage side is carrying out data storage by theme.
Step 103 sends the gathered data including subject information to data storage side, so that data store root evidence The subject information of the gathered data will be at the storage location corresponding to the acquired data storage to corresponding theme.
The subject information of the daily record data is being generated, and the subject information of generation and the daily record data are being packaged as a whole Afterwards, can according to the subject information encapsulated in daily record data, using with the corresponding data channel of the subject information, by daily record data The data such as Kafka storage side is sent to, so that the data such as Kafka store root according to described in the subject information general of the gathered data At storage location corresponding to acquired data storage to corresponding theme.
In the specific implementation, Kafka can be read in advance to be configured and initialize a thread pool, wherein, it is every in thread pool A thread works independently, and creates HashMap and List in each thread for log cache, herein On the basis of, thread every time from it is described obstruction queue obtain a daily record data after can is buffered in List, and using topic as List is stored in HashMap by Key (keyword), subsequently, when the daily record quantity stored in List reaches scheduled quantity threshold During value, the daily record in List is used in corresponding data channel centralized production (transmitting) to Kafka, while clear by topic Empty List.What the processing mode can effectively reduce Kafka and middleware links number or quantity, so as to reduce the pressure of Kafka Power.
Data managing method provided in this embodiment by obtaining the gathered data of data acquisition side, generates the acquisition The subject information of data simultaneously encapsulates the gathered data and the subject information and sends the acquisition number for being packaged with subject information According to data storage side (so that data store root according to the subject information of the gathered data by the acquired data storage to phase Answer at the storage location corresponding to theme), it realizes and the gathered data of data acquisition side is written to data storage side, so as to profit With application scheme, it can be realized by middleware form and the gathered data of the data such as Flume OG acquisition side is written to Kafka Etc. data storage side, solve in the prior art early stage Flume versions because there is no Kafka plug-in units, caused by can not will collect The problem of daily record write-in Kafka arrived.
In the next another embodiment of the application, another data managing method flow chart with reference to shown in figure 3, institute Stating data managing method can also include before the step 103:
Step 104 obtains black and white lists subject information;
Step 105 is believed for blacklist subject information or non-white list theme when the corresponding subject information of the gathered data During breath, the gathered data is filtered out;When the corresponding subject information of the gathered data is non-blacklist subject information or white name During single subject information, the step 103 is triggered.
The present embodiment also provides the black and white lists management function of topic for the middleware, specifically, can basis The real data production requirement of Kafka safeguards the black and white lists of topic, wherein, have recorded phase in the black and white lists of the topic The white list subject information or blacklist subject information answered, the white list subject information include the required data of Kafka Corresponding theme, the blacklist subject information accordingly include the theme corresponding to the unwanted data of Kafka.
In order to realize the daily record data that its required theme is targetedly transmitted to Kafka, when middleware is from flume daily records It, can be by the theme of the daily record data and the topic safeguarded black and white after system obtains daily record data and generates its corresponding theme List is matched, wherein, when the theme of the daily record data is blacklist subject information or non-white list subject information, represent The daily record data is not the data of theme needed for Kafka, so as to which the data filtering is fallen, if conversely, the master of the daily record data Entitled non-blacklist subject information or white list subject information, then it represents that the daily record data is the data of theme needed for Kafka, from And the daily record data can be written to kafka using corresponding data channel according to the theme of the daily record data, and then by Kafka It is stored according to the theme of the daily record data to the storage location corresponding to corresponding theme, for example, being " education " by theme Daily record data write-in kafka at storage location corresponding to theme " education " etc..
When implementing the application, the web container Jetty based on Java of a lightweight can be started, run and use on Jetty In the Web service for carrying out topic black and white lists management, wherein, daily record data will be carried out no longer for the topic for adding in blacklist Production (daily record data that the theme is no longer transmitted to kafka).
The present embodiment provides the log management work(based on topic black and white lists by safeguarding the black and white lists of topic Can, it may filter that the unwanted daily record data in the data such as Kafka storage side, realize targetedly to theme needed for its transmission Daily record data.
In the next embodiment of the application, another data managing method flow chart with reference to shown in figure 4, the number It can also include according to management method:
The data traffic of step 106, the monitoring middleware, and alerted in data traffic exception.
It is realized in the present embodiment using a counting module and traffic monitoring is carried out to the middleware.
Specifically, during middleware obtains Flume daily record datas, using in the counting module real-time statistics Between part receive the daily record quantity of daily record and daily record size (daily record size can be obtained by the accumulative byte number received), in this base It can know the data traffic of middleware according to the daily record quantity of statistics and daily record size, the number to middleware realized with this on plinth It is monitored according to flow, and is alerted when detecting data traffic exception, for example, the stream counted within a certain monitoring period Amount data monitor the flow when difference of data on flows counted in the period is more than the threshold value of setting or counted compared to upper one When data are not in the range of scheduled normal discharge, data traffic abnormality alarming can be carried out.
As a kind of possible realization method, specifically one can be often obtained from the obstruction queue in the thread in thread pool Daily record and when being buffered in List, while increase in counting module the information of the data, such as journal number, daily record Size etc. realizes that the daily record quantity that daily record is received to middleware and daily record size count with this, and then realizes to centre The data traffic of part is monitored.
It during practical application the application, can dispose, start a Zookeeper, and be configured in counting module in advance Zookeeper addresses and counting module send/issue data to the time interval of Zookeeper, on this basis, can be in Between a counter is respectively started for each topic on part, and node is created for memory counter on Zookeeper The count results of publication are realized with this and carry out traffic monitoring to middleware, and shown and counted in a manner of JSON etc. on Zookeeper It counts the statistical result of device and is alerted in Traffic Anomaly.It simultaneously can also be by the black and white lists management function portion of topic It affixes one's name in Zookeeper, black and white lists management is carried out to topic to realize.
The present embodiment realizes the data traffic for monitoring middleware in real time by counting module, and can be in the number of middleware According to alarm is sent out during Traffic Anomaly in time, the prior art is effectively overcome because that can be inconsistent caused by real-time monitoring data flow amount Close the problem of production environment needs.
A kind of middleware disclosed in the next embodiment of the application, the structure of the middleware with reference to shown in figure 5 are shown It is intended to, the middleware includes:
Data capture unit 501, for obtaining the gathered data of data acquisition side;
Theme generation unit 502 for generating the subject information of the gathered data, and encapsulates the gathered data and institute Subject information is stated, obtains the gathered data for including subject information;
Data transmission unit 503, for sending the gathered data including subject information to data storage side, so that number According to storage root according to the subject information of the gathered data by the storage position corresponding to the acquired data storage to corresponding theme Put place.
In an embodiment of the embodiment of the present application, the data capture unit 501 is specifically used for:Based on predetermined association View receives every daily record data of Flume log systems acquisition;Every daily record data of reception is buffered in the resistance being pre-created It fills in queue;Correspondingly, the data transmission unit 503, is specifically used for:Based on the thread pool being pre-created, transmission includes The daily record data of subject information to Kafka distributed posts are subscribed at the storage location in message system corresponding to corresponding theme.
In an embodiment of the embodiment of the present application, as shown in fig. 6, the middleware can also include:Black and white lists Administrative unit 504, is used for:Obtain black and white lists subject information;When the corresponding subject information of the gathered data is blacklist master When inscribing information or non-white list subject information, the gathered data is filtered out;When the corresponding subject information of the gathered data is When non-blacklist subject information or white list subject information, the transmitting element is triggered.
In an embodiment of the embodiment of the present application, as shown in fig. 7, the middleware can also include:Traffic monitoring Unit 505 for monitoring the data traffic of the middleware, and is alerted in data traffic exception.
For middleware disclosed by the embodiments of the present invention, due to its with above example disclosed in data managing method Corresponding, so description is fairly simple, related similarity refers to saying for data managing method part in above example Bright, no further details here.
A kind of data management system disclosed in the next embodiment of the application, with reference to the data management shown in figure 8 The structure diagram of system, the data management system include server cluster 801, wherein, it is each in the server cluster On server deployment is there are one middleware as described above, and the data management system can be based on disposing on each server The data of the data such as Flume acquisition side are written to data storage sides such as Kafka Middleware implementation.
The structure diagram of another data management system with reference to shown in figure 9, in addition to including the server cluster 801, the data management system can also include load balancing managing device 802, for each in the server cluster A server carries out load monitoring and management so that in the server cluster each server load balancing.
It is more to realize by the way that LVS (Linux Virtual Server, Linux virtual server) is configured in the present embodiment The load balancing management of machine middleware.
Specifically, volume that can be on a certain server in the server cluster or independently of the server cluster LVS, and Servers-all one virtual IP address of unified configuration to be deployed with middleware in server cluster are disposed on outer server (Internet Protocol, Internet protocol), on this basis, can be based on the LVS to each in server cluster Server carries out load monitoring and management so that in the server cluster each server load balancing.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference from other examples, and just to refer each other for identical similar part between each embodiment.
For convenience of description, it describes to be divided into various modules when system above or device with function or unit describes respectively. Certainly, the function of each unit is realized can in the same or multiple software and or hardware when implementing the application.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It is realized by the mode of software plus required general hardware platform.Based on such understanding, the technical solution essence of the application On the part that the prior art contributes can be embodied in the form of software product in other words, the computer software product It can be stored in storage medium, such as ROM/RAM, magnetic disc, CD, be used including some instructions so that a computer equipment (can be personal computer, server either network equipment etc.) performs the certain of each embodiment of the application or embodiment Method described in part.
Finally, it is to be noted that, herein, the relational terms of such as first, second, third and fourth or the like It is used merely to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying these There are any actual relationship or orders between entity or operation.Moreover, term " comprising ", "comprising" or its is any Other variants are intended to non-exclusive inclusion, so that process, method, article or equipment including a series of elements Not only include those elements, but also including other elements that are not explicitly listed or further include as this process, side Method, article or the intrinsic element of equipment.In the absence of more restrictions, limited by sentence "including a ..." Element, it is not excluded that also there are other identical elements in the process, method, article or apparatus that includes the element.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (11)

1. a kind of data managing method, which is characterized in that applied to middleware, the method includes:
Obtain the gathered data of data acquisition side;
The subject information of the gathered data is generated, and encapsulates the gathered data and the subject information, obtains including theme The gathered data of information;
The gathered data including subject information is sent to data storage side, so that data store root according to the gathered data Subject information will be at the storage location corresponding to the acquired data storage to corresponding theme.
2. according to the method described in claim 1, it is characterized in that, the data acquisition side be Flume log systems, then it is described The gathered data of data acquisition side is obtained, including:
Every daily record data of Flume log systems acquisition is received based on predetermined protocol;
Every daily record data of reception is buffered in the obstruction queue being pre-created.
3. according to the method described in claim 2, it is characterized in that, the data storage side is subscribed to for Kafka distributed posts Message system, then the gathered data including subject information described in the transmission is to data storage side, including:
Based on the thread pool being pre-created, daily record data to the Kafka distributed posts that transmission includes subject information subscribe to message System.
4. according to the method described in claim 1, it is characterized in that, in the gathered data for including subject information described in the transmission To before data storage side, further include:
Obtain black and white lists subject information;
When the corresponding subject information of the gathered data is blacklist subject information or non-white list subject information, institute is filtered out State gathered data;
When the corresponding subject information of the gathered data is non-blacklist subject information or white list subject information, triggering is sent The step of at storage location corresponding to the gathered data including subject information to the corresponding theme of data storage side.
5. it according to the method described in claim 1, it is characterized in that, further includes:
The data traffic of the middleware is monitored, and is alerted in data traffic exception.
6. a kind of middleware, which is characterized in that including:
Data capture unit, for obtaining the gathered data of data acquisition side;
Theme generation unit for generating the subject information of the gathered data, and encapsulates the gathered data and the theme Information obtains the gathered data for including subject information;
Data transmission unit, for sending the gathered data including subject information to data storage side, so that data store Root will be at the storage location corresponding to the acquired data storage to corresponding theme according to the subject information of the gathered data.
7. middleware according to claim 6, which is characterized in that the data acquisition side is Flume log systems, described Message system is subscribed to for Kafka distributed posts in data storage side;
The then data capture unit, is specifically used for:
Every daily record data of Flume log systems acquisition is received based on predetermined protocol;Every daily record data of reception is cached In the obstruction queue being pre-created;
Correspondingly, the data transmission unit, is specifically used for:
Based on the thread pool being pre-created, daily record data to the Kafka distributed posts that transmission includes subject information subscribe to message System.
8. middleware according to claim 6, which is characterized in that further include:
Black and white lists administrative unit, is used for:
Obtain black and white lists subject information;When the corresponding subject information of the gathered data is blacklist subject information or non-white name During single subject information, the gathered data is filtered out;When the corresponding subject information of the gathered data is believed for non-blacklist theme When breath or white list subject information, the transmitting element is triggered.
9. middleware according to claim 6, which is characterized in that further include:
Traffic monitoring unit for monitoring the data traffic of the middleware, and is alerted in data traffic exception.
10. a kind of data management system, which is characterized in that including server cluster, wherein, it is each in the server cluster There are one such as claim 6-9 any one of them middlewares for server disposition.
11. system according to claim 10, which is characterized in that further include:
Load balancing managing device, for carrying out load monitoring and management to each server in the server cluster, with Cause the load balancing of each server in the server cluster.
CN201711473196.6A 2017-12-29 2017-12-29 A kind of data managing method, middleware and data management system Pending CN108197233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711473196.6A CN108197233A (en) 2017-12-29 2017-12-29 A kind of data managing method, middleware and data management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711473196.6A CN108197233A (en) 2017-12-29 2017-12-29 A kind of data managing method, middleware and data management system

Publications (1)

Publication Number Publication Date
CN108197233A true CN108197233A (en) 2018-06-22

Family

ID=62586403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711473196.6A Pending CN108197233A (en) 2017-12-29 2017-12-29 A kind of data managing method, middleware and data management system

Country Status (1)

Country Link
CN (1) CN108197233A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108989314A (en) * 2018-07-20 2018-12-11 北京木瓜移动科技股份有限公司 A kind of Transmitting Data Stream, processing method and processing device
CN109325200A (en) * 2018-07-25 2019-02-12 北京京东尚科信息技术有限公司 Obtain the method, apparatus and computer readable storage medium of data
CN109525448A (en) * 2019-01-10 2019-03-26 北京智信未来信息技术有限公司 Log data acquisition system and method
CN109657125A (en) * 2018-12-14 2019-04-19 平安城市建设科技(深圳)有限公司 Data processing method, device, equipment and storage medium based on web crawlers
CN110502491A (en) * 2019-07-25 2019-11-26 北京神州泰岳智能数据技术有限公司 A kind of Log Collect System and its data transmission method, device
CN110515619A (en) * 2019-08-09 2019-11-29 济南浪潮数据技术有限公司 Theme creation method, device and equipment and readable storage medium
CN110569112A (en) * 2019-09-12 2019-12-13 华云超融合科技有限公司 Log data writing method and object storage daemon device
CN110688383A (en) * 2019-09-26 2020-01-14 中国银行股份有限公司 Data acquisition method and system
CN110889132A (en) * 2019-11-04 2020-03-17 中盈优创资讯科技有限公司 Distributed application permission verification method and device
CN111143314A (en) * 2019-12-26 2020-05-12 厦门服云信息科技有限公司 Log analysis method and system based on high-speed streaming processing technology
CN111200637A (en) * 2019-12-20 2020-05-26 新浪网技术(中国)有限公司 Cache processing method and device
CN111625452A (en) * 2020-05-22 2020-09-04 上海哔哩哔哩科技有限公司 Flow playback method and system
WO2020211622A1 (en) * 2019-04-16 2020-10-22 深圳前海微众银行股份有限公司 Blockchain-based message storage method and device
CN112261069A (en) * 2020-12-22 2021-01-22 国网江苏省电力有限公司信息通信分公司 Message blacklist generation method for electric power internet of things management platform
CN112527618A (en) * 2020-12-17 2021-03-19 中国农业银行股份有限公司 Log collection method and log collection system
CN112615920A (en) * 2020-12-18 2021-04-06 北京达佳互联信息技术有限公司 Abnormality detection method, abnormality detection device, electronic apparatus, storage medium, and program product
CN113760564A (en) * 2020-10-20 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method, device and system
CN115296973A (en) * 2022-05-06 2022-11-04 北京数联众创科技有限公司 Method, device and application for batch collection and sending of front-end logs

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034541A (en) * 2012-11-16 2013-04-10 北京奇虎科技有限公司 Distributing type information system and equipment and method thereof
US20150254328A1 (en) * 2013-12-26 2015-09-10 Webtrends Inc. Methods and systems that categorize and summarize instrumentation-generated events
CN105608223A (en) * 2016-01-12 2016-05-25 北京中交兴路车联网科技有限公司 Hbase database entering method and system for kafka
CN105786683A (en) * 2016-03-03 2016-07-20 四川长虹电器股份有限公司 Customized log collecting system and method
CN106776249A (en) * 2016-11-28 2017-05-31 华迪计算机集团有限公司 A kind of processing method and system of the business diary for concurrently generating

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034541A (en) * 2012-11-16 2013-04-10 北京奇虎科技有限公司 Distributing type information system and equipment and method thereof
US20150254328A1 (en) * 2013-12-26 2015-09-10 Webtrends Inc. Methods and systems that categorize and summarize instrumentation-generated events
CN105608223A (en) * 2016-01-12 2016-05-25 北京中交兴路车联网科技有限公司 Hbase database entering method and system for kafka
CN105786683A (en) * 2016-03-03 2016-07-20 四川长虹电器股份有限公司 Customized log collecting system and method
CN106776249A (en) * 2016-11-28 2017-05-31 华迪计算机集团有限公司 A kind of processing method and system of the business diary for concurrently generating

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108989314A (en) * 2018-07-20 2018-12-11 北京木瓜移动科技股份有限公司 A kind of Transmitting Data Stream, processing method and processing device
CN109325200A (en) * 2018-07-25 2019-02-12 北京京东尚科信息技术有限公司 Obtain the method, apparatus and computer readable storage medium of data
CN109657125A (en) * 2018-12-14 2019-04-19 平安城市建设科技(深圳)有限公司 Data processing method, device, equipment and storage medium based on web crawlers
CN109525448A (en) * 2019-01-10 2019-03-26 北京智信未来信息技术有限公司 Log data acquisition system and method
WO2020211622A1 (en) * 2019-04-16 2020-10-22 深圳前海微众银行股份有限公司 Blockchain-based message storage method and device
CN110502491A (en) * 2019-07-25 2019-11-26 北京神州泰岳智能数据技术有限公司 A kind of Log Collect System and its data transmission method, device
CN110515619A (en) * 2019-08-09 2019-11-29 济南浪潮数据技术有限公司 Theme creation method, device and equipment and readable storage medium
CN110569112A (en) * 2019-09-12 2019-12-13 华云超融合科技有限公司 Log data writing method and object storage daemon device
CN110569112B (en) * 2019-09-12 2022-04-08 江苏安超云软件有限公司 Log data writing method and object storage daemon device
CN110688383A (en) * 2019-09-26 2020-01-14 中国银行股份有限公司 Data acquisition method and system
CN110889132A (en) * 2019-11-04 2020-03-17 中盈优创资讯科技有限公司 Distributed application permission verification method and device
CN111200637A (en) * 2019-12-20 2020-05-26 新浪网技术(中国)有限公司 Cache processing method and device
CN111200637B (en) * 2019-12-20 2022-07-08 新浪网技术(中国)有限公司 Cache processing method and device
CN111143314A (en) * 2019-12-26 2020-05-12 厦门服云信息科技有限公司 Log analysis method and system based on high-speed streaming processing technology
CN111625452A (en) * 2020-05-22 2020-09-04 上海哔哩哔哩科技有限公司 Flow playback method and system
CN111625452B (en) * 2020-05-22 2024-04-16 上海哔哩哔哩科技有限公司 Flow playback method and system
CN113760564A (en) * 2020-10-20 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method, device and system
CN112527618A (en) * 2020-12-17 2021-03-19 中国农业银行股份有限公司 Log collection method and log collection system
CN112615920A (en) * 2020-12-18 2021-04-06 北京达佳互联信息技术有限公司 Abnormality detection method, abnormality detection device, electronic apparatus, storage medium, and program product
CN112615920B (en) * 2020-12-18 2023-03-14 北京达佳互联信息技术有限公司 Abnormality detection method, abnormality detection device, electronic apparatus, storage medium, and program product
CN112261069A (en) * 2020-12-22 2021-01-22 国网江苏省电力有限公司信息通信分公司 Message blacklist generation method for electric power internet of things management platform
CN115296973A (en) * 2022-05-06 2022-11-04 北京数联众创科技有限公司 Method, device and application for batch collection and sending of front-end logs
CN115296973B (en) * 2022-05-06 2024-10-22 北京清研兰亭科技有限公司 Method, device and application for collecting and sending front-end journals in batches

Similar Documents

Publication Publication Date Title
CN108197233A (en) A kind of data managing method, middleware and data management system
US20220300354A1 (en) System and method for tagging and tracking events of an application
CN103617038B (en) A kind of service monitoring method and device of distribution application system
CN105224445B (en) Distributed tracking system
CN106953740B (en) Processing method, client, server and system for page access data in application
CN109074377B (en) Managed function execution for real-time processing of data streams
CN107145489B (en) Information statistics method and device for client application based on cloud platform
Tse et al. Global zoom/pan estimation and compensation for video compression
CN109634818A (en) Log analysis method, system, terminal and computer readable storage medium
CN106487596A (en) Distributed Services follow the tracks of implementation method
CN107104840A (en) A kind of daily record monitoring method, apparatus and system
WO2016206600A1 (en) Information flow data processing method and device
CN106878064A (en) Data monitoring method and device
CN106953758A (en) A kind of dynamic allocation management method and system based on Nginx servers
CN107895009A (en) One kind is based on distributed internet data acquisition method and system
CN110232010A (en) A kind of alarm method, alarm server and monitoring server
CN108021809A (en) A kind of data processing method and system
CN107168847A (en) The full link application monitoring method and device of a kind of support distribution formula framework
CN113448812A (en) Monitoring alarm method and device under micro-service scene
CN107291594A (en) The device and method that openstack platforms are monitored and managed to ceph
US9054969B2 (en) System and method for situation-aware IP-based communication interception and intelligence extraction
Gao A General Logging Service for Symbian based Mobile Phones
CN107257289A (en) A kind of risk analysis equipment, monitoring system and monitoring method
CN112395357A (en) Data collection method and device and electronic equipment
CN116932148B (en) Problem diagnosis system and method based on AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180622