CN111258971A - Application state monitoring alarm system and method based on access log - Google Patents
Application state monitoring alarm system and method based on access log Download PDFInfo
- Publication number
- CN111258971A CN111258971A CN202010025168.3A CN202010025168A CN111258971A CN 111258971 A CN111258971 A CN 111258971A CN 202010025168 A CN202010025168 A CN 202010025168A CN 111258971 A CN111258971 A CN 111258971A
- Authority
- CN
- China
- Prior art keywords
- application
- log
- module
- alarm
- application log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 33
- 238000004458 analytical method Methods 0.000 claims description 21
- 230000003068 static effect Effects 0.000 claims description 19
- 238000001914 filtration Methods 0.000 claims description 15
- 238000005192 partition Methods 0.000 claims description 15
- 238000005516 engineering process Methods 0.000 description 9
- 230000002159 abnormal effect Effects 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 101001094649 Homo sapiens Popeye domain-containing protein 3 Proteins 0.000 description 1
- 101000608234 Homo sapiens Pyrin domain-containing protein 5 Proteins 0.000 description 1
- 101000578693 Homo sapiens Target of rapamycin complex subunit LST8 Proteins 0.000 description 1
- 102100027802 Target of rapamycin complex subunit LST8 Human genes 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses an application state monitoring alarm system and method based on an access log, wherein the system comprises a log collection module, a log subscription module, a filter chain component and a database module, wherein the log collection module is used for processing an application log file from a log source and converting the format of the application log file into a data stream, and the data stream is used as an application log message to be stored and issued according to a theme; the log subscription module is used for subscribing the application log message according to the theme, pushing the application log message to the filter chain component and processing the application log message; the filter chain component is also used for accessing the database module and accessing, inquiring and updating the application log message and the processing result thereof; the filter chain component comprises an alarm module which is used for sending alarm information to a system association responsible person according to the service state of the application log message and the association rule of the application and the system.
Description
Technical Field
The invention belongs to the technical field of distributed publish-subscribe messages, and particularly relates to a technology for applying a state monitoring alarm by using a distributed subscribe message.
Background
At present, mobile internet technology is highly developed, and various functions and affairs in daily life of people are solved by means of network services. On the other hand, with the increasing of internet users, a large website generally adopts a plurality of application servers with the same role to form a distributed network system, so that when a user accesses the website, each application server can access the user relatively uniformly, and the load balance of the distributed network system is realized.
In the existing distributed network system, in order to ensure the normal operation of the website, the availability of the website needs to be monitored. The availability of the website includes the availability of each application server at the system level and the availability of the webpage application content provided by the application server, wherein, for the availability at the system level, the monitoring of the prior art is relatively perfect, for example, the perfect monitoring can be provided for the basic data such as load, network bandwidth, CPU, IO, memory, and the like. The monitoring of the availability of the application content is relatively complex, specifically, on one hand, an error of the application content does not necessarily cause an error at a system level, and on the other hand, the error of the application content is directly related to the accuracy of information acquisition by a user, and the abnormal condition of the application content is various, for example, the application program is partially abnormal, which may reduce the accuracy of the application content displayed on the web page, so that the application content displayed on the web page is erroneous or incomplete.
In order to timely handle application content errors of the application server due to the application program, some application schemes and technical problems thereof are as follows.
1. And applying a monitoring alarm technology. An application program can customize a health _ check page to output the state of the application, if the application service is abnormal in service, the fault problem of the application service is timely found by monitoring the health _ check mode, and technicians are timely notified to analyze the reason by alarming. The problem of insufficient coverage rate exists in the applied monitoring alarm technology, the coverage rate depends on whether the deployed monitoring client and the service add the unified alarm rule, generally represents the physical condition and the network condition of the monitoring client, but cannot represent the access condition of all online real users, so that part of scenes may be omitted, especially for monitoring in a cluster environment, the deployed monitoring node client has the condition that the route cannot be covered, and the state of the machine service in part of clusters cannot be monitored.
Http service monitoring. The Http service monitoring technology can request the accessed resource at regular time, judge whether an abnormal condition exists by defining a returned state and a returned value, and inform a user of an alarm if the abnormal condition exists. The HTTP service monitoring is generally a service that a simulated client is deployed independently to simulate an actual request to invoke an application, and has the main defects that the client is limited and cannot cover all scenes, and it is difficult to cover service hosts of all services under the cluster service due to the limitation of IP. Because the existing system deployment adopts a high-availability cluster deployment mode, the coverage rate often depends on the node condition of a client, for example, if the node IP is HASH to the same service machine or a plurality of service machines, the operation condition of other machines cannot be discovered. There are also situations where the situation is different from operator to operator and the response of the service may be different.
3. Log monitoring is applied. In order to discover errors of the application service in time, most of the application service integrates log frameworks, such as log4j, logback and the like, the log frameworks respectively define log levels of different levels, such as debug, info, war, error and the like, system-level error exceptions are generally sent to be output through the level of the error, and many public can discover and pay attention to the service log of the application in time through an application administrator based on monitoring of the error log. Although the applications monitor the service state of the applications by using a log mode, the applications only monitor and alarm events or logs of the service system of the applications, and in many practical scenarios, the service is unavailable due to factors outside the service system, such as problems caused by unstable network and interception of external security equipment, and the service system cannot be perceived. In addition, the application level monitoring technology is only limited to the exception of the service, and is difficult to capture whether the resource exists, whether the URL is wrong, the timeout exception and the like, and particularly, the probability of the occurrence of the static resource and the interface calling is high.
Disclosure of Invention
In order to solve the technical problems of various application service monitoring and alarming technologies in the prior art: 1) the application level monitoring and alarming technology is only limited to the abnormity of the service, and is difficult to capture whether the resource exists, whether the URL is wrong, the overtime abnormity exists and the like, and particularly, the probability of the static resource and the interface calling is higher; 2) coverage is not enough, and the deployed monitoring client, on which the monitoring coverage depends, generally represents the physical condition and the network condition of the monitoring client, but cannot represent the access condition of all online real users, so that part of scenes can be omitted; 3) and no systematization is formed, data statistics and analysis can not be carried out through a back-end service, and estimation and optimization measurement of availability are realized on the basis. The application state monitoring and alarming method based on the access log is provided, the logs of all access ends are directly accessed, the condition of accessing the logs can be analyzed, and the condition of all real users can be reached.
In order to achieve the technical effect, the invention adopts the following technical scheme.
An application state monitoring and alarming system based on an access log comprises a log collection module, a log subscription module, a filter chain component and a database module, wherein,
the log collection module is used for processing and converting the format of an application log file from a log source into a data stream, and storing and issuing the data stream as an application log message according to a theme;
the log subscription module is used for subscribing the application log message according to the theme, pushing the application log message to the filter chain component and processing the application log message;
the filter chain component is also used for accessing the database module and accessing, inquiring and updating the application log message and the processing result thereof;
the filter chain component comprises an alarm module which is used for sending alarm information to a system association responsible person according to the service state of the application log message and the association rule of the application and the system.
In addition, the log collection module stores and issues the log according to a cluster mode, the cluster comprises a plurality of server nodes, and each server node stores one or more partitions of a subject application log message; the log subscription module acquires application log messages in a pulling mode, wherein the pulling mode is to pull a certain amount of application log messages according to a preset offset every time.
Additionally, the filter chain component further comprises an HTTP state analyzer module that determines whether the application log message falls within the alarm traffic rule by analyzing HTTP code; the HTTP state analyzer is also used for filtering static resources and filtering the static resources irrelevant to the service; and the HTTP state analyzer module determines the service state of the application log message through the HTTP code analysis and static resource filtering operation.
In addition, the filter chain component further comprises a user analyzer, wherein the user analyzer is connected to the alarm module and used for determining contact information of a system association responsible person according to an association rule between a preset application and the system, when the service state analysis result of the application log message is that alarm is needed, the data needing alarm and the contact information of the system association responsible person are returned to the alarm module, and the alarm module sends related alarm information.
In addition, the filter chain component also comprises a statistical module, wherein the statistical module is used for classifying and statistically analyzing the application log messages, and the statistical module classifies and statistically analyzes the application log messages according to the dimensions of login application, login addresses, login codes and the like, and provides basic data for analysis in the later period.
In addition, the filter chain component further comprises a cache manager that caches and updates the base configuration information and communicates the base configuration information to the database.
In addition, the filter component also includes a database manager for connecting a relational database and an online analytics database using JDBC, providing fast database query and update services for other business modules of the filter component.
An application state monitoring alarm method based on an access log comprises the following steps:
A. processing and format converting an application log file from a log source into a data stream, and storing and issuing the data stream as an application log message according to a theme;
B. subscribing application log information according to the theme, pushing the application log information to a filter chain component, and processing the application log information;
C. sending alarm information to a system association responsible person according to the service state of the application log message and the association rule of the application and the system; and the number of the first and second groups,
D. and accessing the database module, and accessing, inquiring and updating the application log message and the processing result thereof.
In addition, the storing and publishing the data stream as the application log message according to the theme includes: storing and publishing the messages in a cluster manner, wherein the cluster comprises a plurality of server nodes, and each server node stores one or more partitions of a subject application log message; and the step of subscribing to the application log message according to the topic comprises: and acquiring the application log messages by adopting a pulling mode, wherein the pulling mode is to pull a certain amount of application log messages according to a preset offset every time.
Wherein the process of processing the application log message includes determining a service state of the application log message, and the determining the service state of the application log message specifically includes:
b21, determining whether the application log message belongs to the alarm service rule by analyzing the HTTP code;
b22, filtering static resources not related to the service.
Drawings
Fig. 1 is a schematic structural diagram of an application status monitoring and alarming system based on an access log according to an embodiment of the present invention.
Fig. 2 is a flowchart of an application status monitoring and alarming method based on an access log according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a log collection module in an access log-based application status monitoring alarm system according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
Detailed exemplary embodiments are disclosed below. However, specific structural and functional details disclosed herein are merely for purposes of describing example embodiments.
It should be understood, however, that the intention is not to limit the invention to the particular exemplary embodiments disclosed, but to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure. Like reference numerals refer to like elements throughout the description of the figures.
Referring to the drawings, the structures, ratios, sizes, and the like shown in the drawings are only used for matching the disclosure of the present disclosure, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present disclosure can be implemented, so that the present disclosure has no technical significance, and any structural modification, ratio relationship change, or size adjustment should still fall within the scope of the disclosure of the present disclosure without affecting the efficacy and the achievable purpose of the present disclosure. Meanwhile, the positional limitation terms used in the present specification are for clarity of description only, and are not intended to limit the scope of the present invention, and changes or modifications of the relative relationship therebetween may be regarded as the scope of the present invention without substantial changes in the technical content.
It will also be understood that the term "and/or" as used herein includes any and all combinations of one or more of the associated listed items. It will be further understood that when an element or unit is referred to as being "connected" or "coupled" to another element or unit, it can be directly connected or coupled to the other element or unit or intervening elements or units may also be present. Moreover, other words used to describe the relationship between components or elements should be understood in the same manner (e.g., "between" versus "directly between," "adjacent" versus "directly adjacent," etc.).
Fig. 1 is a schematic structural diagram of an application status monitoring and alarming system based on an access log according to an embodiment of the present invention. As shown in the drawings, the embodiment of the invention discloses an application state monitoring alarm system based on an access log, which comprises a log collection module, a log subscription module, a filter chain component and a database module, wherein,
the log collection module is used for processing and converting the format of an application log file from a log source into a data stream, and storing and issuing the data stream as an application log message according to a theme;
the log subscription module is used for subscribing the application log message according to the theme, pushing the application log message to the filter chain component and processing the application log message;
the filter chain component is also used for accessing the database module and accessing, inquiring and updating the application log message and the processing result thereof;
the filter chain component comprises an alarm module which is used for sending alarm information to a system association responsible person according to the service state of the application log message and the association rule of the application and the system.
Therefore, the method and the system can accurately grasp the condition that the online user accesses the service system in the operation process of the service system in real time, and help platform operators and technicians quickly find and locate problems through complete log positioning and log output directly based on the real online access condition, thereby improving the stability and the service quality of the whole service system.
Fig. 3 is a schematic structural diagram of a log collection module in an access log-based application status monitoring alarm system according to an embodiment of the present invention. As shown in fig. 3, in the specific embodiment of the present invention, the log collection module stores and issues the log according to a cluster, where the cluster includes a plurality of server nodes, and each server node stores one or more partitions of a subject application log message; the log subscription module acquires application log messages in a pulling mode, wherein the pulling mode is to pull a certain amount of application log messages according to a preset offset every time.
Typically, the log source information comes from a Nginx server, which is a high-performance HTTP and reverse proxy web server, and also provides IMAP/POP3/SMTP services, and the Nginx acts as a load balancing service: the Nginx can directly support the Rails and the PHP program to carry out external service inside, and can also support the external service as HTTP proxy service.
The Nginx can generate log files for the state and action of a plurality of applications, and all business systems at the back end of the Nginx can be served. Next, the log file from the Nginx server is subjected to extraction, analysis, format conversion and the like based on the function of the Heka, then the formatted file is converted into a data stream, the data stream passes through a Kafka component, and the formatted application log data stream is sent to a Kafka message service for data storage.
The Kafka assembly is typically characterized by: the message is persisted into a plurality of topics (topics). Unlike peer-to-peer messaging systems, a consumer may subscribe to one or more topics, the consumer may consume all data in the topic, the same piece of data may be consumed by multiple consumers, and the data may not be immediately deleted after being consumed. In a publish-subscribe messaging system, the producer of a message is called a publisher and the consumer is called a subscriber.
As shown in fig. 3, 4 partitions are configured for an application log message of a certain theme. Partition 1 has two offsets (offsets): 0 and 1; the 2 nd partition has 4 offsets; the 3 rd partition has 1 offset; the 4 th partition has 3 offsets.
If the number of copies of a topic is 4, then Kafka will create 4 identical copies for each part in the cluster. Each server node (Broker) in the cluster stores one or more partitions. Multiple publishers and consumers can produce and consume data simultaneously.
The server node in which the first copy is stored is called a Leader node (Leader) and the following server nodes are called Follower nodes (followers). First Kafka will partition the received messages, with a different partition for each subject message. Thus, on the one hand, the storage of the message is not limited by the size of the storage space of a single server, and on the other hand, the processing of the message can be performed in parallel on a plurality of servers. Secondly, to ensure high availability, there will be a certain number of copies per partition. Therefore, if some servers are unavailable, the server where the copy is located can take over, and the continuity of the application is guaranteed. In order to ensure higher processing efficiency, the reading and writing of the message are all completed on a fixed copy. The node where this copy is located is the so-called lead node, while the nodes where the other copies are located are the follow nodes. While the follower node will periodically synchronize data to the lead node.
Therefore, the log files from a plurality of applications are stored on the Kafka cluster according to different subjects and are issued through the Kafka cluster, so that the processing efficiency is ensured on one hand, and the high availability of the system is also ensured on the other hand.
The Kafka consumer as a log subscription module pushes the application log message to a filter chain component of a log processing component core by subscribing the application log message of the Kafka, and in order to avoid an avalanche event, the message is acquired in a pulling mode, and the message is pulled by a preset offset every time. The predetermined offset can be dynamically configured, and according to the service situation, for example, in one embodiment, the offset is selected to be 10000, and the filter chain component can basically process the messages within 10 seconds, so that the real-time performance is better.
Therefore, peak clipping processing is effectively performed in the present invention, where the peak clipping processing is performed when the traffic is large (for example, a plurality of applications are accessed by a large amount of users, and thus a large amount of log files are generated in a short time), so as to avoid performance bottleneck of the log processing component and generate an avalanche effect.
Because the system deployment in the prior art adopts a high-availability cluster deployment mode, the coverage rate often depends on the node condition of the client, for example, if the node IP is HASH-connected to the same or several service machines, the operation condition of other machines cannot be discovered. There are also situations where the situation is different from operator to operator and the response of the service may be different. For the situation, in the specific embodiment of the present invention, because the logs of all the application access terminals are directly accessed, the situations of all the access logs can be analyzed, and the situations of all the real users can be reached, so that the problem of low coverage rate in the prior art can be solved.
In a specific embodiment of the present invention, the filter chain component further comprises an HTTP state analyzer module, wherein the HTTP state analyzer module determines whether the application log message belongs to the alarm business rule by analyzing an HTTP code; the HTTP state analyzer is also used for filtering static resources and filtering the static resources irrelevant to the service; and the HTTP state analyzer module determines the service state of the application log message through the HTTP code analysis and static resource filtering operation.
The HTTP code analysis is to determine the state code needing alarming according to the statistical result, remove the fault state code which is not serious, and reserve the state code which may cause serious influence on the service system as the basis of alarming, so that on one hand, problems can be found in time, and on the other hand, the phenomenon that too many alarms cause the loss of alarming significance is avoided.
For example, for HTTP status codes, 7 dimensions of log alarm rules are determined: 401 (Unauthorized), 403 (Forbidden), 409 (Conflict), 499 (Timeout), 500(internalserver error), 502(BadGateway failure), 504(gateway Timeout); for some other fault-like codes, no alarm is triggered, such as 400(badreq, request error), etc.
However, these status codes that may cause an alarm are not treated uniformly, for example, the IP, PATH, etc. that need to analyze the alarm source when the alarm is sent to 401 or 403 are pushed to the system administrator for analysis. 499 an alarm needs to be analyzed because in many cases the browser will also trigger 499 if the request is manually cancelled by the user, such an alarm needs to be avoided. 500. 502, 504 will explicitly inform the system administrator of the real server IP after the nsinx reason.
The static resource filtering means that the HTTP state analyzer actively filters some static resources irrelevant to the service, such as a crawler request and a browser behavior of ico, so as to avoid too much interference that does not need attention on the service.
Therefore, in the embodiment of the present invention, the HTTP state analyzer determines the service state of the application log message through the HTTP code analysis and the static resource filtering operation, and analyzes the source of the application log and the service system through analyzing the loader.
In a specific embodiment of the present invention, the filter chain component further includes a user analyzer, connected to the alarm module, and configured to determine contact information of a system-associated principal according to an association rule between a preset application and the system, and when a service state analysis result of the application log message is that an alarm is required, return data that the alarm is required and the contact information of the system-associated principal to the alarm module, and the alarm module sends related alarm information.
In the present invention, the association rule between the system and the application is firstly applied, and then the system is traced back to the system association responsible person according to the system, and the contact information of the system association responsible person can be stored in the Mysql database, for example, the contact information is realized by establishing a contact between a domain name and the system association responsible person.
The alarm module comprises 2 types of functions, wherein one type adopts a mobile phone to alarm, and the other type adopts a mail box to alarm. The alarm module provides class 2 alarm service, and if the analysis result of the application log message indicates that the requirement of alarm exists, the alarm module sends the alarm to a designated system association responsible person according to the rule of the alarm module.
Therefore, the method and the system can accurately grasp the condition that the online user accesses the service system in the operation process of the service system in real time, and help platform operators and technicians quickly find and locate problems through complete log positioning and log output directly based on the real online access condition, thereby improving the stability and the service quality of the whole service system. In the specific embodiment of the invention, the application logs of all the access terminals are directly accessed, all the conditions of accessing the logs can be analyzed, all the external conditions can be touched, and the problems that in the prior art, only the events or the logs of the business system of the user are monitored and alarmed, and the actual situations are that services are unavailable due to external factors of the business system, such as unstable network, interception of external safety equipment and the like, can be solved, and the business system in the prior art cannot be perceived. In addition, the application level monitoring technology is only limited to the service exception, and is difficult to capture whether resources exist, whether URLs are wrong, timeout exception and the like, and particularly has the technical problem of high probability in the aspects of static resources and interface calling.
In addition, in a specific embodiment of the present invention, the filter chain component further includes a statistical module, the statistical module is configured to classify and statistically analyze the application log messages, and the statistical module classifies and statistically analyzes data of the application log messages according to dimensions such as login application, login address, login code, and the like, and provides basic data for analysis in a later stage.
The statistical module realizes the data analysis of SLA, wherein logger classification statistics is logged in to analyze application dimension; and the login address IP: analyzing the regional distribution of the alarm; the login codes are used for grading the alarm, and each login Code represents different states and error levels.
Data analysis is carried out through a statistical module, particularly data classification and statistics are carried out according to dimensions of login application, login addresses, login codes and the like, so that data guidance is provided for further adjusting and improving the application state monitoring alarm system and method based on the access logs in the specific embodiment of the invention and being suitable for more embodiments, for example, partial codes can be added into codes needing alarm through analysis and statistics results of the login codes, and partial codes originally needing alarm are removed.
For example, a login address from a certain area frequently generates a false alarm of a certain Code, in a specific embodiment of the present invention, the HTTP state analyzer may adjust its Code analysis policy (alarm service rule), and mark the Code as not needing an alarm for the login address from the area. Or an alarm Code of a certain Code frequently occurs in a certain login application, the HTTP state analyzer may prioritize the expansion of the static filtering range for the application, or take a more flexible manner, for example, exclude the Code from the login application from the alarm range for a period of time, so as to avoid overloading the system, and also avoid the associated responsible person of the system receiving too frequent alarm information.
Therefore, through the data analysis function of the statistical module, the corresponding relation of the login application, the login address and the login code can be flexibly adjusted, for example, a flexible adjustment mode is provided for the alarm decision of the HTTP state analyzer and the alarm module, and the flexibility and the high reliability of the specific implementation mode of the invention are further improved.
In addition, in the specific embodiment of the present invention, the filter chain component further includes a cache manager, which caches and updates the basic configuration information, and transmits the basic configuration information to the database.
The basic configuration comprises alarm configuration association, domain name configuration and the like, and the cache manager provides rapid data service for the statistic module, the HTTP state analyzer and the alarm module. Particularly, the HTTP status analyzer and the alarm module need to process a large amount of application log messages (for example, 10000 messages are processed in 10 seconds), and therefore, the configuration association of the alarm, the configuration of the domain name, the correspondence between the error code and the alarm, and the like need to be obtained quickly.
In addition, in the specific embodiment of the present invention, the filter component further includes a database manager, and the database manager is configured to connect the relational database and the online analysis database using JDBC, and provide a service of querying and updating the fast database for other business modules of the filter component.
In the specific embodiment of the present invention, the databases include MysqL relational database and TiDB online analysis database, and the relational database refers to a set of programs (database management system software) that include interrelated logical organization and access to these data. A relational database management system is a system that manages a relational database and logically organizes data. And the TiDB online analysis database can be deployed on local and cloud platforms and supports public cloud, private cloud and mixed cloud. The user can select a corresponding mode to deploy the TiDB cluster according to actual scenes or requirements.
By respectively deploying the data in the MysqL relational database and the TiDB online analysis database, on one hand, the query and the update of other service modules of the filter assembly are facilitated, on the other hand, the throughput capacity of a large amount of data can be guaranteed, and all application log messages are covered by a cloud storage mode.
Just because the embodiment of the present invention is not limited to processing log files generated by a single application, but all application log files from Nginx are subscribed, analyzed, processed and counted, so that all application log conditions can be analyzed, all external (relative to the application itself) information can be reached, and a real and effective basis is provided for application status monitoring and alarm.
Fig. 2 is a flowchart of an application status monitoring and alarming method based on an access log according to an embodiment of the present invention. As shown in fig. 2, corresponding to the application status monitoring system based on the access log in the embodiment of the present invention, an application status monitoring alarm method based on the access log is further included, which includes the steps of:
A. processing and format converting an application log file from a log source into a data stream, and storing and issuing the data stream as an application log message according to a theme;
B. subscribing application log information according to the theme, pushing the application log information to a filter chain component, and processing the application log information;
C. sending alarm information to a system association responsible person according to the service state of the application log message and the association rule of the application and the system; and the number of the first and second groups,
D. and accessing the database module, and accessing, inquiring and updating the application log message and the processing result thereof.
It should be noted that although the C/D steps are depicted as having a precedence relationship in fig. 2, actually sending alarm information and accessing the database module are not in a precedence order, and both occur at any time during the application log message processing.
In addition, in the specific embodiment of the present invention, the storing and publishing the data stream as the application log message according to the topic includes: storing and publishing the messages in a cluster manner, wherein the cluster comprises a plurality of server nodes, and each server node stores one or more partitions of a subject application log message; and the step of subscribing to the application log message according to the topic comprises: and acquiring the application log messages by adopting a pulling mode, wherein the pulling mode is to pull a certain amount of application log messages according to a preset offset every time.
In addition, in a specific embodiment of the present invention, the processing the application log message includes determining a service state of the application log message, and the determining the service state of the application log message specifically includes:
b21, determining whether the application log message belongs to the alarm service rule by analyzing the HTTP code;
b22, filtering static resources not related to the service.
While the foregoing description shows and describes several preferred embodiments of the invention, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. An application state monitoring and alarming system based on an access log comprises a log collection module, a log subscription module, a filter chain component and a database module, wherein,
the log collection module is used for processing and converting the format of an application log file from a log source into a data stream, and storing and issuing the data stream as an application log message according to a theme;
the log subscription module is used for subscribing the application log message according to the theme, pushing the application log message to the filter chain component and processing the application log message;
the filter chain component is also used for accessing the database module and accessing, inquiring and updating the application log message and the processing result thereof;
the filter chain component comprises an alarm module which is used for sending alarm information to a system association responsible person according to the service state of the application log message and the association rule of the application and the system.
2. The access log based application status monitoring alarm system of claim 1, wherein the log collection module stores and issues log messages in a cluster, the cluster comprising a plurality of server nodes, each server node storing one or more partitions of a subject application log message; the log subscription module acquires application log messages in a pulling mode, wherein the pulling mode is to pull a certain amount of application log messages according to a preset offset every time.
3. The access log based application state monitoring alarm system of claim 1, wherein the filter chain component further comprises an HTTP state analyzer module that determines whether an application log message falls within an alarm traffic rule by analyzing HTTP code; the HTTP state analyzer is also used for filtering static resources and filtering the static resources irrelevant to the service; and the HTTP state analyzer module determines the service state of the application log message through the HTTP code analysis and static resource filtering operation.
4. The application state monitoring alarm system based on the access log as claimed in claim 1, wherein the filter chain component further comprises a user analyzer, the user analyzer is connected to the alarm module and is configured to determine the contact information of the system associated responsible person according to a preset association rule between the application and the system, when the service state analysis result of the application log message is that an alarm is required, the data required to be alarmed and the contact information of the system associated responsible person are returned to the alarm module, and the alarm module sends related alarm information.
5. The access log based application state monitoring alarm system of claim 1, wherein the filter chain component further comprises a statistics module, wherein the statistics module is configured to classify and statistically analyze application log messages, and the statistics module classifies and statistically analyzes application log messages according to dimensions such as login application, login address, login code, and the like, and provides basic data for analysis at a later stage.
6. The access log based application state monitoring alarm system of claim 1, wherein the filter chain component further comprises a cache manager that caches and updates the base configuration information and communicates the base configuration information to a database.
7. The access log based application state monitoring alarm system of claim 1, wherein the filter component further comprises a database manager for providing fast database query and update services for other business modules of the filter component using JDBC connection relational databases and online analytics databases.
8. An application state monitoring alarm method based on an access log comprises the following steps:
A. processing and format converting an application log file from a log source into a data stream, and storing and issuing the data stream as an application log message according to a theme;
B. subscribing application log information according to the theme, pushing the application log information to a filter chain component, and processing the application log information;
C. sending alarm information to a system association responsible person according to the service state of the application log message and the association rule of the application and the system; and the number of the first and second groups,
D. and accessing the database module, and accessing, inquiring and updating the application log message and the processing result thereof.
9. The access log based application state monitoring alarm method of claim 8, wherein storing and publishing the data stream as an application log message according to a topic comprises: storing and publishing the messages in a cluster manner, wherein the cluster comprises a plurality of server nodes, and each server node stores one or more partitions of a subject application log message; and the step of subscribing to the application log message according to the topic comprises: and acquiring the application log messages by adopting a pulling mode, wherein the pulling mode is to pull a certain amount of application log messages according to a preset offset every time.
10. The access log based application status monitoring alarm method according to claim 8, wherein the performing application log message processing includes determining a service status of an application log message, and the determining the service status of the application log message specifically includes:
b21, determining whether the application log message belongs to the alarm service rule by analyzing the HTTP code;
b22, filtering static resources not related to the service.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010025168.3A CN111258971A (en) | 2020-01-10 | 2020-01-10 | Application state monitoring alarm system and method based on access log |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010025168.3A CN111258971A (en) | 2020-01-10 | 2020-01-10 | Application state monitoring alarm system and method based on access log |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111258971A true CN111258971A (en) | 2020-06-09 |
Family
ID=70950310
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010025168.3A Pending CN111258971A (en) | 2020-01-10 | 2020-01-10 | Application state monitoring alarm system and method based on access log |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111258971A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111884882A (en) * | 2020-07-29 | 2020-11-03 | 北京千丁互联科技有限公司 | Monitoring coverage rate detection method and device |
CN113377791A (en) * | 2021-06-10 | 2021-09-10 | 北京齐尔布莱特科技有限公司 | Data processing method, system and computing equipment |
CN113778780A (en) * | 2020-11-27 | 2021-12-10 | 北京京东尚科信息技术有限公司 | Application stability determination method and device, electronic equipment and storage medium |
CN113868083A (en) * | 2021-09-24 | 2021-12-31 | 猪八戒股份有限公司 | Method for realizing intelligent flow switching based on real-time analysis of application request logs |
CN114024838A (en) * | 2021-11-26 | 2022-02-08 | 北京天融信网络安全技术有限公司 | Log processing method and device and electronic equipment |
CN114490711A (en) * | 2022-01-11 | 2022-05-13 | 盈立数智科技(深圳)有限公司 | Streaming system market information screening method based on reflection mechanism |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070283194A1 (en) * | 2005-11-12 | 2007-12-06 | Phillip Villella | Log collection, structuring and processing |
CN105224445A (en) * | 2015-10-28 | 2016-01-06 | 北京汇商融通信息技术有限公司 | Distributed tracking system |
CN108874558A (en) * | 2018-05-31 | 2018-11-23 | 康键信息技术(深圳)有限公司 | News subscribing method, electronic device and the readable storage medium storing program for executing of distributed transaction |
CN109347665A (en) * | 2018-10-07 | 2019-02-15 | 杭州安恒信息技术股份有限公司 | A kind of Website Usability alarm method and its system based on web log |
CN110262807A (en) * | 2019-06-20 | 2019-09-20 | 北京百度网讯科技有限公司 | Cluster creates Progress Log acquisition system, method and apparatus |
-
2020
- 2020-01-10 CN CN202010025168.3A patent/CN111258971A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070283194A1 (en) * | 2005-11-12 | 2007-12-06 | Phillip Villella | Log collection, structuring and processing |
CN105224445A (en) * | 2015-10-28 | 2016-01-06 | 北京汇商融通信息技术有限公司 | Distributed tracking system |
CN108874558A (en) * | 2018-05-31 | 2018-11-23 | 康键信息技术(深圳)有限公司 | News subscribing method, electronic device and the readable storage medium storing program for executing of distributed transaction |
CN109347665A (en) * | 2018-10-07 | 2019-02-15 | 杭州安恒信息技术股份有限公司 | A kind of Website Usability alarm method and its system based on web log |
CN110262807A (en) * | 2019-06-20 | 2019-09-20 | 北京百度网讯科技有限公司 | Cluster creates Progress Log acquisition system, method and apparatus |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111884882A (en) * | 2020-07-29 | 2020-11-03 | 北京千丁互联科技有限公司 | Monitoring coverage rate detection method and device |
CN113778780A (en) * | 2020-11-27 | 2021-12-10 | 北京京东尚科信息技术有限公司 | Application stability determination method and device, electronic equipment and storage medium |
CN113778780B (en) * | 2020-11-27 | 2024-05-17 | 北京京东尚科信息技术有限公司 | Application stability determining method and device, electronic equipment and storage medium |
CN113377791A (en) * | 2021-06-10 | 2021-09-10 | 北京齐尔布莱特科技有限公司 | Data processing method, system and computing equipment |
CN113868083A (en) * | 2021-09-24 | 2021-12-31 | 猪八戒股份有限公司 | Method for realizing intelligent flow switching based on real-time analysis of application request logs |
CN113868083B (en) * | 2021-09-24 | 2024-07-16 | 猪八戒股份有限公司 | Method for realizing intelligent flow switching based on real-time analysis of application request log |
CN114024838A (en) * | 2021-11-26 | 2022-02-08 | 北京天融信网络安全技术有限公司 | Log processing method and device and electronic equipment |
CN114490711A (en) * | 2022-01-11 | 2022-05-13 | 盈立数智科技(深圳)有限公司 | Streaming system market information screening method based on reflection mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111258971A (en) | Application state monitoring alarm system and method based on access log | |
CN109714192B (en) | Monitoring method and system for monitoring cloud platform | |
US10333816B2 (en) | Key network entity detection | |
US6941367B2 (en) | System for monitoring relevant events by comparing message relation key | |
US7577701B1 (en) | System and method for continuous monitoring and measurement of performance of computers on network | |
US20070130330A1 (en) | System for inventing computer systems and alerting users of faults to systems for monitoring | |
CN108900374B (en) | Data processing method and device applied to DPI equipment | |
CN106411629B (en) | Method and equipment for monitoring state of CDN node | |
CN112350854B (en) | Flow fault positioning method, device, equipment and storage medium | |
CN110209518A (en) | A kind of multi-data source daily record data, which is concentrated, collects storage method and device | |
US20220006854A1 (en) | Microservice manager and optimizer | |
CN112714013B (en) | Application fault positioning method in cloud environment | |
US20200389517A1 (en) | Monitoring web applications including microservices | |
CN105610648A (en) | Operation and maintenance monitoring data collection method and server | |
CN108427619B (en) | Log management method and device, computing equipment and storage medium | |
CN108390907B (en) | Management monitoring system and method based on Hadoop cluster | |
US20200252262A1 (en) | Event-triggered distributed data collection in a distributed transaction monitoring system | |
CN110688277A (en) | Data monitoring method and device for micro-service framework | |
CN114430383A (en) | Method and device for screening detection nodes, electronic equipment and storage medium | |
CN109032904A (en) | Monitored, management server and data acquisition, analysis method and management system | |
CN113778810A (en) | Log collection method, device and system | |
CN112463540A (en) | Multi-region monitoring method, system, equipment and readable storage medium | |
CN108959041B (en) | Method for transmitting information, server and computer readable storage medium | |
CN116662127A (en) | Method, system, equipment and medium for classifying and early warning equipment alarm information | |
JP6926646B2 (en) | Inter-operator batch service management device and inter-operator batch service management method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220505 Address after: 100080 Beijing Haidian District Zhongguancun Street 27, 16 floor 1601 room. Applicant after: Beijing Nongxin Shuzhi Technology Co.,Ltd. Address before: 100080 1601 16 street, 27 Zhongguancun street, Haidian District, Beijing. Applicant before: BEIJING NONGXIN INTERNET TECHNOLOGY GROUP Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200609 |
|
RJ01 | Rejection of invention patent application after publication |