CN112347326B - Crawler detection method and device based on browser end - Google Patents
Crawler detection method and device based on browser end Download PDFInfo
- Publication number
- CN112347326B CN112347326B CN202011055400.4A CN202011055400A CN112347326B CN 112347326 B CN112347326 B CN 112347326B CN 202011055400 A CN202011055400 A CN 202011055400A CN 112347326 B CN112347326 B CN 112347326B
- Authority
- CN
- China
- Prior art keywords
- browser
- user behavior
- behavior data
- characteristic
- characteristic value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention provides a crawler detection method and a crawler detection device based on a browser end, wherein the method comprises the following steps: if a target data acquisition request is received, acquiring user behavior data of a browser end sending the target data acquisition request and a characteristic value of the browser, and detecting the characteristics of an automation tool of the browser end based on a JS script; and judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool. According to the embodiment of the invention, the user behavior data, the characteristic value of the browser and the characteristics of the automation tool are comprehensively considered, so that the crawler detection result is more accurate.
Description
Technical Field
The invention relates to the technical field of information security, in particular to a crawler detection method and device based on a browser end.
Background
With the development of computer technology and the coming of big data era, data security is more and more emphasized by internet companies. At present, more and more browsers and various automatic tools are applied to data acquisition, and how to protect own data from being acquired by malicious crawlers is a serious subject.
The existing crawler detection method generally realizes crawler detection by performing statistical analysis on logs of data requests. The log of data requests includes browser and system information, IP addresses of requests, web addresses of requests for access, and user behavior records, among others. And carrying out statistical analysis on abnormal information contained in the log, and judging whether crawler acquisition data exists or not. Due to the fact that information contained in the log is limited, accurate detection of the crawler is difficult to achieve through the log information alone.
Disclosure of Invention
The embodiment of the invention provides a crawler detection method and device based on a browser end, which are used for solving the problem that crawler detection is inaccurate in the prior art and realizing accurate crawler detection.
The embodiment of the invention provides a crawler detection method based on a browser end, which comprises the following steps:
if a target data acquisition request is received, acquiring user behavior data of a browser end sending the target data acquisition request and a characteristic value of the browser, and detecting the characteristics of an automation tool of the browser end based on a JS script;
and judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool.
According to the crawler detection method based on the browser end, the step of judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool comprises the following steps:
carrying out numerical quantification on the user behavior data, the characteristic value of the browser and the characteristic of the automatic tool to obtain the user behavior data, the characteristic value of the browser and the value of the zone bit of the characteristic of the automatic tool;
sorting and combining the user behavior data, the characteristic value of the browser and the flag bits of the characteristics of the automatic tool according to the preset sequence of the user behavior data, the preset sequence of the characteristic value of the browser and the preset sequence of the characteristics of the automatic tool;
and verifying the combination result corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool, and judging whether the target data acquisition request is a crawler request according to the verification result.
According to the crawler detection method based on the browser end, the steps of quantifying the user behavior data, the characteristic value of the browser and the characteristic of the automation tool in a numerical manner and acquiring the user behavior data, the characteristic value of the browser and the flag bit value of the characteristic of the automation tool comprise:
if the user behavior data, the characteristic value of the browser or the characteristic of the automation tool are detected, setting the value of the flag bit of the user behavior data, the characteristic value of the browser or the characteristic of the automation tool to be 1;
and if the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool are not detected, setting the value of the flag bit of the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool to be 0.
According to the crawler detection method based on the browser end, the step of sorting and combining the user behavior data, the characteristic value of the browser and the flag bit of the characteristic of the automation tool according to the preset sequence of the user behavior data, the characteristic value of the browser and the characteristic of the automation tool respectively comprises the following steps:
sorting the zone bits of the user behavior data according to a preset sequence of the user behavior data, and performing bit operation on the value of the zone bit of each user behavior data according to a sorting result and then accumulating;
sorting the zone bits of the characteristic values of the browsers according to a preset sequence of the characteristic values of the browsers, performing bit operation on the zone bit values of the characteristic values of each browser according to a sorting result, and accumulating;
and sequencing the flag bits of the features of the automatic tools according to a preset sequence of the features of the automatic tools, and performing bit operation on the flag bit values of the features of the automatic tools according to a sequencing result and then accumulating.
According to the browser-side-based crawler detection method of one embodiment of the present invention, the steps of sorting the flag bits of the user behavior data according to the preset sequence of the user behavior data, and performing bit operation on the value of the flag bit of each user behavior data according to the sorting result and then accumulating the value specifically include:
grouping the user behavior data according to the category of the user behavior data;
sorting the zone bits of each group of user behavior data according to the preset sequence of the user behavior data;
and performing bit operation on the values of the zone bits of each group of user behavior data according to the sequencing result of the zone bits of each group of user behavior data, and accumulating.
According to the crawler detection method based on the browser end, the step of verifying the combination result corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool comprises the following steps:
recombining all the combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool, and taking the recombined results as the characteristics of the client;
and verifying the client characteristics under the target data acquisition request based on a verification function according to the client characteristics under the normal data acquisition request.
According to the crawler detection method based on the browser end, the step of recombining all combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool comprises the following steps:
and directly splicing all the combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool, or splicing by using a preset identifier.
The embodiment of the invention also provides a crawler detection device based on the browser end, which comprises the following components:
the data acquisition module is used for acquiring user behavior data of a browser end sending a target data acquisition request and a characteristic value of the browser if the target data acquisition request is received, and detecting the characteristics of an automation tool of the browser end based on a JS script;
and the data checking module is used for judging whether the target data acquisition request is a crawler request according to the user behavior data, the characteristic value of the browser and the characteristics of an automation tool.
The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the steps of the browser-end-based crawler detection method are realized.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the above-mentioned browser-side-based crawler detection methods.
According to the crawler detection method and device based on the browser end, provided by the embodiment of the invention, whether the target data acquisition request is the crawler request is judged by comprehensively considering the user behavior data of the browser end sending the target data acquisition request, the characteristic value of the browser and the characteristics of an automation tool, so that the judgment result is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flowchart of a browser-side-based crawler detection method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart illustrating a complete browser-side-based crawler detection method according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a browser-side-based crawler detection apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The browser-side-based crawler detection method according to the embodiment of the present invention is described below with reference to fig. 1, where the method includes: s101, if a target data acquisition request is received, acquiring user behavior data of a browser end sending the target data acquisition request and a characteristic value of the browser, and detecting the characteristics of an automation tool of the browser end based on a JS script;
specifically, when a target data collection request is received, user behavior data of a browser end, a characteristic value of the browser and a characteristic of an automation tool of the browser end are collected in real time. The target data acquisition request may be a client request or a crawler request. The user behavior of the browser end is an operation behavior of the user at the browser end for acquiring target data, for example, a behavior of clicking a mouse, a behavior of dragging the mouse, a behavior of pressing a keyboard, and the like.
The characteristic value of the browser comprises the characteristic attribute of the browser and the performance information characteristic of the loaded webpage of the browser. The browser characteristic attribute may be a height and a width of a window of the browser, a color resolution of a browser Screen, a bit depth of a palette on a browser device or a buffer, a device angle of the browser, system power information, CPU (Central Processing Unit) related information, Screen object information, Document object information, and the like; the Performance information characteristics of the browser loaded webpage comprise webpage Performance measured by a Performance API (Application Programming Interface) access function and Performance information related to Performance Timing object access delay.
The characteristics of the automation tool at the browser end can be detected through JS (JavaScript) scripts, and the detection comprises browser page element hiding detection, browser pull-up drive detection, browser object model detection, Webdriver browser drive information, browser Navigator object detection, browser Location object detection, browser Window object detection and the like.
When a client sends a request, user behavior data and a characteristic value of a browser are detected; the features of the automation tool are detected only when a crawler request is issued. For example, a link hidden by a browser would not be displayed in a page and would not be accessible to normal users, but a crawler might place the link in a queue to be crawled and initiate a request for the link.
And S102, judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool.
Specifically, the user behavior data, the feature value of the browser, and the feature of the automation tool may be respectively determined, and then it is determined whether the target data collection request is a crawler request, or may be combined, and then it is determined whether the target data collection request is a crawler request according to the combined data.
In the embodiment, the user behavior data of the browser end sending the target data acquisition request, the characteristic value of the browser and the characteristics of the automation tool are comprehensively considered, and whether the target data acquisition request is a crawler request is judged, so that the judgment result is more accurate.
On the basis of the foregoing embodiment, in this embodiment, the step of determining whether the target data collection request is a crawler request according to the user behavior data, a feature value of a browser, and a feature of an automation tool includes: performing numerical quantification on the user behavior data, the characteristic value of the browser and the characteristic of the automatic tool to obtain the user behavior data, the characteristic value of the browser and the value of the flag bit of the characteristic of the automatic tool;
specifically, for the convenience of verification, each collected user behavior data, the characteristic value of the browser and the characteristic of the automation tool are respectively quantified. And when the numerical value is quantized, the value of the zone bit is determined by judging whether each user behavior data, the characteristic value of the browser and the characteristic of the automation tool are detected.
Sorting and combining the user behavior data, the characteristic value of the browser and the zone bits of the characteristics of the automation tool according to the preset sequence of the user behavior data, the characteristic value of the browser and the characteristics of the automation tool; and verifying the combination result corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool, and judging whether the target data acquisition request is a crawler request according to the verification result.
After the collected user behavior data, the characteristic value of the browser and the characteristic data of the automation tool are subjected to numerical quantification, the flag bit obtained by numerical quantification is checked, whether the target data collection request is a crawler request can be judged, and a complete flow chart of the crawler detection method based on the browser end is shown in fig. 2. The method comprises the steps that each user behavior data, the characteristic values of the browsers and the characteristics of the automation tools are provided with a zone bit, the zone bits of all the user behavior data are combined according to a preset sequence, the zone bits of the characteristic values of all the browsers are combined after being arranged according to the preset sequence, and the zone bits of the characteristics of all the automation tools are combined according to the preset sequence. The order of the flag bits in each combination is set in advance so that, when the combination is verified, the target data collection request and the normal data collection request belong to the same type of feature data with respect to the flag bits at the same position in the combination. The preset sequence can be determined according to the sequence detected by each feature data under the normal data acquisition request. In this case, after the value of the flag bit in each combination is verified, the sequence of the feature data detected by the flag bit in each combination can be verified.
According to the crawler detection method and device, data quantization is carried out on the three collected characteristic information, the difference between the target data collection request and the normal data collection request about the mark bits in the same combination can be distinguished more visually, professional knowledge is not needed, the combination of the mark bits of the characteristic data is directly checked, whether the target data collection request is a crawler request or not can be judged, and the crawler detection speed is increased.
On the basis of the foregoing embodiment, in this embodiment, the step of performing numerical quantization on the user behavior data, the feature value of the browser, and the feature of the automation tool to obtain the user behavior data, the feature value of the browser, and the flag bit value of the feature of the automation tool includes: if the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool are detected, setting the flag bit value of the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool to be 1; and if the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool are not detected, setting the value of the flag bit of the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool to be 0.
Specifically, if each user behavior data, a characteristic value of the browser, or a characteristic of the automation tool is detected, the value of the flag bit is set to 1, otherwise, the value is set to 0. For example, when a mouse click behavior in the user behavior data is detected, setting the value of a flag bit corresponding to the mouse click to 1; if not, the flag bit has a value of 0. And constructing each combination of the zone bits according to the value of the zone bits and the preset sequence of the zone bits in each combination.
On the basis of the foregoing embodiment, in this embodiment, the step of sorting and combining the user behavior data, the feature value of the browser, and the flag bits of the features of the automation tool according to the preset order of the user behavior data, the feature value of the browser, and the features of the automation tool respectively includes: sorting the zone bits of the user behavior data according to a preset sequence of the user behavior data, and performing bit operation on the zone bit values of the user behavior data according to a sorting result and then accumulating the zone bits; sorting the zone bits of the characteristic values of the browsers according to a preset sequence of the characteristic values of the browsers, and accumulating after performing bit operation on the zone bit values of the characteristic values of each browser according to a sorting result; and sequencing the flag bits of the features of the automatic tools according to a preset sequence of the features of the automatic tools, and performing bit operation on the values of the flag bits of the features of the respective automatic tools according to a sequencing result and then accumulating.
Specifically, after the zone bits of the user behavior data are sorted, the positions of the zone bits in the combination are determined, then bit operation is performed on the values of the zone bits according to the positions of the zone bits, so that codes of the user behavior data can be obtained, and finally the codes of the user behavior data are accumulated to obtain the feature codes of the user behavior data. Where the bit operation may be bit left shifting or bit right shifting. The formula of the signature code of each combination is as follows, where n is the total number of flag bits in each combination.
F ═ Boolean (mark bit 1) < <1) + (Boolean (mark bit 2) < <2) + … (Boolean (mark bit n) < < n)
For example, the arrangement sequence of user behavior data is preset to be a mouse click behavior and a mouse dragging behavior, and if the two behaviors are both detected, the values of the flag bits are both 1; according to Boolean (flag bit (i) < j), the binary code of mouse click is 10, the mouse dragging behavior code is 100, wherein i represents the flag bit, j represents the position of the flag bit, and < represents the left shift operator according to the bit. Then, the two codes are added according to binary logic to carry out accumulation operation, and the characteristic code of the two codes is obtained as 110.
On the basis of the foregoing embodiment, the step of sorting the flag bits of the user behavior data according to the preset sequence of the user behavior data, and performing bit operation and accumulation on the values of the flag bits of each user behavior data according to the sorting result specifically includes: grouping the user behavior data according to the category of the user behavior data; sorting the zone bits of each group of user behavior data according to a preset sequence of the user behavior data; and performing bit operation on the value of the zone bit of each group of user behavior data according to the sequencing result of the zone bit of each group of user behavior data, and then accumulating.
Specifically, the collected user behavior data is one or more. If the user behavior data are multiple, the user behavior data can be grouped according to categories, and then each group of user behavior data is sorted and combined. For example, the categories of the user behavior data may be divided into a mouse event, a keyboard event, and a touch event, the user behavior data is first grouped, flag bits of the user behavior data corresponding to the mouse event, the keyboard event, and the touch event are respectively sorted and then combined, and feature codes of each category of user behavior data are obtained. And judging whether the target data acquisition request is a crawler request or not according to the feature codes of each type of user behavior data, the feature values of the browser and the verification results of the feature codes of the features of the automation tool. Similarly, the characteristic values of the browser and the characteristics of the automation tool can be grouped, and then the zone bits of each group of characteristic data are sorted and combined.
On the basis of the foregoing embodiments, the step of verifying the combination result corresponding to the user behavior data, the feature value of the browser, and the feature of the automation tool in this embodiment includes: recombining all the combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool, and taking the recombined results as the characteristics of the client; and verifying the client characteristics under the target data acquisition request based on a verification function according to the client characteristics under the normal data acquisition request.
The normal data acquisition request is a request sent by a client. In the embodiment, the user behavior data, the characteristic value of the browser and the characteristics of the automation tool are comprehensively considered, and the corresponding characteristic codes are combined again according to the preset sequence to serve as the client characteristics. And verifying the characteristics of the client by adopting a verification function, and judging whether the target data acquisition request is a crawler request or not according to a verification result. The verification process is performed according to the client characteristics under the normal data acquisition request, and the client characteristics under the normal data acquisition request can be flexibly adjusted according to actual conditions.
On the basis of the foregoing embodiment, in this embodiment, the step of recombining all combination results corresponding to the user behavior data, the feature value of the browser, and the feature of the automation tool includes: and directly splicing all the combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool, or splicing by using a preset identifier.
Specifically, when the user behavior data, the feature value of the browser, and the feature code corresponding to the feature of the automation tool are recombined again, they may be directly combined, or a preset identifier may be added between every two adjacent feature codes and then the combination may be performed. For example, if the feature code corresponding to the mouse event is S, the feature code corresponding to the keyboard event is J, the feature code corresponding to the feature value of the browser is P, and the feature code corresponding to the feature value of the automation tool is M, the client feature W obtained by direct splicing is S + J + P + M, and the client feature W obtained by splicing using the preset identifier is S + ' | J + ' | + P + ' | + M.
For example, if the combination code of the user behavior data is 110, the combination code of the feature value of the browser is 1010, and the combination code of the feature of the automation tool is 000, the combination codes can be directly combined to obtain 1101010000; a separator can also be added between each two combined codes, and the client characteristic is 110|1010| 000.
The browser-end-based crawler detection apparatus provided by the embodiment of the present invention is described below, and the browser-end-based crawler detection apparatus described below and the browser-end-based crawler detection method described above may be referred to in a corresponding manner.
As shown in fig. 3, the browser-based crawler detection apparatus provided in this embodiment includes a data acquisition module 301 and a data verification module 302;
the data acquisition module 301 is configured to, if a target data acquisition request is received, acquire user behavior data of a browser end that sends the target data acquisition request and a feature value of the browser, and detect a feature of an automation tool of the browser end based on a JS script;
specifically, when a target data collection request is received, user behavior data of a browser end, a characteristic value of the browser and a characteristic of an automation tool of the browser end are collected in real time. The target data acquisition request may be a client request or a crawler request. The user behavior of the browser end is the operation behavior of the user at the browser end for acquiring the target data.
The characteristic value of the browser comprises the characteristic attribute of the browser and the performance information characteristic of the loaded webpage of the browser. The browser characteristic attribute may be a window height and a width of a browser, a color resolution of a browser Screen, a bit depth of a palette on a browser device or a buffer, a device angle of the browser, system power information, CPU related information, Screen object information, Document object information, and the like; the Performance information characteristics of the browser loaded webpage comprise the Performance information related to the webpage Performance measured by the Performance API access function and the Performance information related to the Performance Timing object access delay.
The characteristics of the automation tool at the browser end can be detected through the JS script, and the characteristics comprise browser page element hiding detection, browser pull-up driving detection, browser object model detection, Webdriver browser driving information, browser Navigator object detection, browser Location object detection, browser Window object detection and the like.
The user behavior data and browser feature values are detected only when the client makes a request, and the automation tool features are detected only when a crawler request is made. For example, a link hidden by a browser would not be displayed in a page and would not be accessible to normal users, but a crawler might place the link in a queue to be crawled and initiate a request for the link.
The data checking module 302 is configured to determine whether the target data collection request is a crawler request according to the user behavior data, a feature value of a browser, and features of an automation tool.
Specifically, the user behavior data, the characteristic value of the browser, and the characteristic of the automation tool may be respectively determined, and then it is determined whether the target data collection request is a crawler request, or the target data collection request may be combined, and then it is determined whether the target data collection request is a crawler request according to the combined data.
In the embodiment, the user behavior data of the browser end sending the target data acquisition request, the characteristic value of the browser and the characteristics of the automation tool are comprehensively considered, and whether the target data acquisition request is a crawler request is judged, so that the judgment result is more accurate.
On the basis of the foregoing embodiment, the data checking module in this embodiment is specifically configured to: carrying out numerical quantification on the user behavior data, the characteristic value of the browser and the characteristic of the automatic tool to obtain the flag bits of the user behavior data, the characteristic value of the browser and the characteristic of the automatic tool; sorting and combining the user behavior data, the characteristic value of the browser and the zone bits of the characteristics of the automation tool according to the preset sequence of the user behavior data, the characteristic value of the browser and the characteristics of the automation tool; and verifying the combination result corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool, and judging whether the target data acquisition request is a crawler request according to the verification result.
On the basis of the foregoing embodiment, the detection module in this embodiment is specifically configured to: if the user behavior data, the characteristic value of the browser or the characteristic of the automation tool are detected, setting a flag bit of the user behavior data, the characteristic value of the browser or the characteristic of the automation tool to be 1; if the user behavior data, the characteristic value of the browser or the characteristic of the automation tool are not detected, setting the flag bit of the user behavior data, the characteristic value of the browser or the characteristic of the automation tool to be 0.
On the basis of the foregoing embodiment, the calculating module in this embodiment is specifically configured to: sorting the zone bits of the user behavior data according to a preset sequence of the user behavior data, and performing bit operation on the zone bits of each user behavior data according to a sorting result and then accumulating the zone bits; sorting the zone bits of the characteristic values of the browsers according to a preset sequence of the characteristic values of the browsers, performing bit operation on the zone bits of the characteristic values of each browser according to a sorting result, and accumulating; and sequencing the flag bits of the features of the automatic tools according to a preset sequence of the features of the automatic tools, and performing bit operation on the flag bits of the features of the respective automatic tools according to a sequencing result and then accumulating.
On the basis of the foregoing embodiment, the calculating module in this embodiment is further configured to: grouping the user behavior data according to the category of the user behavior data; sorting the zone bits of each group of user behavior data according to the preset sequence of the user behavior data; and performing bit operation on the zone bits of each group of user behavior data according to the sequencing result of the zone bits of each group of user behavior data, and then accumulating.
On the basis of the foregoing embodiments, the data checking module in this embodiment further functions to: recombining all the combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool, and taking the recombined results as the characteristics of the client; and verifying the client characteristics under the target data acquisition request based on a verification function according to the client characteristics under the normal data acquisition request.
On the basis of the above embodiment, the splicing module in this embodiment is specifically configured to: and directly splicing all the combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool, or splicing by using a preset identifier.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform a browser-based crawler detection method comprising: if a target data acquisition request is received, acquiring user behavior data of a browser end sending the target data acquisition request and a characteristic value of the browser, and detecting the characteristics of an automation tool of the browser end based on a JS script; and judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the browser-side-based crawler detection method provided by the above-mentioned method embodiments, where the method includes: if a target data acquisition request is received, acquiring user behavior data of a browser end sending the target data acquisition request and a characteristic value of the browser, and detecting the characteristics of an automation tool of the browser end based on a JS script; and judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for browser-side-based crawler detection provided by the above embodiments, where the method includes: if a target data acquisition request is received, acquiring user behavior data of a browser end sending the target data acquisition request and a characteristic value of the browser, and detecting the characteristics of an automation tool of the browser end based on a JS script; and judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (7)
1. A crawler detection method based on a browser end is characterized by comprising the following steps:
if a target data acquisition request is received, acquiring user behavior data of a browser end sending the target data acquisition request and a characteristic value of the browser, and detecting the characteristics of an automation tool of the browser end based on a JS script;
judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of an automation tool;
the step of judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool comprises the following steps:
carrying out numerical quantification on the user behavior data, the characteristic value of the browser and the characteristic of the automatic tool to obtain the user behavior data, the characteristic value of the browser and the value of the zone bit of the characteristic of the automatic tool;
sorting and combining the user behavior data, the characteristic value of the browser and the zone bits of the characteristics of the automation tool according to the preset sequence of the user behavior data, the characteristic value of the browser and the characteristics of the automation tool;
verifying a combination result corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool, and judging whether the target data acquisition request is a crawler request or not according to the verification result;
the step of carrying out numerical value quantization on the user behavior data, the characteristic value of the browser and the characteristic of the automation tool to obtain the user behavior data, the characteristic value of the browser and the value of the zone bit of the characteristic of the automation tool comprises the following steps:
if the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool are detected, setting the flag bit value of the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool to be 1;
if the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool are not detected, setting the value of the flag bit of the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool to be 0;
the step of sorting and combining the user behavior data, the characteristic value of the browser and the flag bit of the characteristic of the automatic tool according to the preset sequence of the user behavior data, the characteristic value of the browser and the characteristic of the automatic tool respectively comprises the following steps:
sorting the zone bits of the user behavior data according to a preset sequence of the user behavior data, and performing bit operation on the value of the zone bit of each user behavior data according to a sorting result and then accumulating;
sorting the zone bits of the characteristic values of the browsers according to a preset sequence of the characteristic values of the browsers, and accumulating after performing bit operation on the zone bit values of the characteristic values of each browser according to a sorting result;
and sequencing the flag bits of the features of the automatic tools according to a preset sequence of the features of the automatic tools, and performing bit operation on the flag bit values of the features of the automatic tools according to a sequencing result and then accumulating.
2. The browser-side-based crawler detection method according to claim 1, wherein the step of sorting the flag bits of the user behavior data according to a preset sequence of the user behavior data, and performing bit operation on the flag bit values of each user behavior data according to a sorting result and then accumulating the flag bits comprises:
grouping the user behavior data according to the category of the user behavior data;
sorting the zone bits of each group of user behavior data according to a preset sequence of the user behavior data;
and performing bit operation on the value of the zone bit of each group of user behavior data according to the sequencing result of the zone bit of each group of user behavior data, and then accumulating.
3. The browser-based crawler detection method according to claim 1 or 2, wherein the step of verifying the combination result corresponding to the user behavior data, the characteristic value of the browser, and the characteristic of the automation tool comprises:
recombining all the combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristics of the automation tool, and taking the recombined results as the characteristics of the client;
and verifying the client characteristics under the target data acquisition request based on a verification function according to the client characteristics under the normal data acquisition request.
4. The browser-side based crawler detection method of claim 3, wherein the step of recombining all combination results corresponding to the user behavior data, the characteristic values of the browser, and the characteristics of the automation tool comprises:
and directly splicing all the combination results corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool, or splicing by using a preset identifier.
5. A crawler detection device based on a browser end is characterized by comprising:
the data acquisition module is used for acquiring user behavior data of a browser end sending a target data acquisition request and a characteristic value of the browser if the target data acquisition request is received, and detecting the characteristics of an automation tool of the browser end based on a JS script;
the data checking module is used for judging whether the target data acquisition request is a crawler request or not according to the user behavior data, the characteristic value of the browser and the characteristics of an automation tool;
the data verification module comprises:
the detection module is used for carrying out numerical quantification on the user behavior data, the characteristic value of the browser and the characteristic of the automatic tool to obtain the flag bits of the user behavior data, the characteristic value of the browser and the characteristic of the automatic tool;
the computing module is used for sorting and combining the user behavior data, the characteristic value of the browser and the zone bits of the characteristics of the automation tool according to the preset sequence of the user behavior data, the preset sequence of the characteristic value of the browser and the preset sequence of the characteristics of the automation tool;
the verification module is used for verifying a combination result corresponding to the user behavior data, the characteristic value of the browser and the characteristic of the automation tool and judging whether the target data acquisition request is a crawler request or not according to the verification result;
the detection module is specifically configured to: if the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool are detected, setting a flag bit of the user behavior data, the characteristic value of the browser or the characteristic of the automatic tool to be 1;
if the user behavior data, the characteristic value of the browser or the characteristic of the automation tool are not detected, setting a flag bit of the user behavior data, the characteristic value of the browser or the characteristic of the automation tool to be 0;
the calculation module is specifically configured to:
sorting the zone bits of the user behavior data according to a preset sequence of the user behavior data, and performing bit operation on the zone bits of the user behavior data according to a sorting result and then accumulating the zone bits;
sorting the zone bits of the characteristic values of the browsers according to a preset sequence of the characteristic values of the browsers, and accumulating after carrying out bit operation on the zone bits of the characteristic values of the browsers according to a sorting result;
and sequencing the flag bits of the features of the automatic tools according to a preset sequence of the features of the automatic tools, and performing bit operation on the flag bits of the features of the respective automatic tools according to a sequencing result and then accumulating.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the browser-based crawler detection method according to any one of claims 1 to 4 when executing the program.
7. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the browser-side based crawler detection method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011055400.4A CN112347326B (en) | 2020-09-29 | 2020-09-29 | Crawler detection method and device based on browser end |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011055400.4A CN112347326B (en) | 2020-09-29 | 2020-09-29 | Crawler detection method and device based on browser end |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112347326A CN112347326A (en) | 2021-02-09 |
CN112347326B true CN112347326B (en) | 2022-07-15 |
Family
ID=74361386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011055400.4A Active CN112347326B (en) | 2020-09-29 | 2020-09-29 | Crawler detection method and device based on browser end |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112347326B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678492A (en) * | 2013-11-13 | 2014-03-26 | 复旦大学 | Web click counting method based on web crawler behavior identification and buffering updating strategies |
CN105871850A (en) * | 2016-04-05 | 2016-08-17 | 携程计算机技术(上海)有限公司 | Crawler detection method and crawler detection system |
CN108282443A (en) * | 2017-01-05 | 2018-07-13 | 阿里巴巴集团控股有限公司 | A kind of reptile Activity recognition method and apparatus |
CN109150790A (en) * | 2017-06-15 | 2019-01-04 | 北京京东尚科信息技术有限公司 | The recognition methods of Web page crawler and device |
CN110909229A (en) * | 2019-11-27 | 2020-03-24 | 佛山科学技术学院 | Webpage data acquisition and storage system based on simulated browser access |
US10747881B1 (en) * | 2017-09-15 | 2020-08-18 | Palo Alto Networks, Inc. | Using browser context in evasive web-based malware detection |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413859A (en) * | 2019-06-27 | 2019-11-05 | 平安科技(深圳)有限公司 | Webpage information search method, apparatus, computer equipment and storage medium |
-
2020
- 2020-09-29 CN CN202011055400.4A patent/CN112347326B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678492A (en) * | 2013-11-13 | 2014-03-26 | 复旦大学 | Web click counting method based on web crawler behavior identification and buffering updating strategies |
CN105871850A (en) * | 2016-04-05 | 2016-08-17 | 携程计算机技术(上海)有限公司 | Crawler detection method and crawler detection system |
CN108282443A (en) * | 2017-01-05 | 2018-07-13 | 阿里巴巴集团控股有限公司 | A kind of reptile Activity recognition method and apparatus |
CN109150790A (en) * | 2017-06-15 | 2019-01-04 | 北京京东尚科信息技术有限公司 | The recognition methods of Web page crawler and device |
US10747881B1 (en) * | 2017-09-15 | 2020-08-18 | Palo Alto Networks, Inc. | Using browser context in evasive web-based malware detection |
CN110909229A (en) * | 2019-11-27 | 2020-03-24 | 佛山科学技术学院 | Webpage data acquisition and storage system based on simulated browser access |
Also Published As
Publication number | Publication date |
---|---|
CN112347326A (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7594142B1 (en) | Architecture for automated detection and analysis of security issues | |
EP2691848B1 (en) | Determining machine behavior | |
CN113489713B (en) | Network attack detection method, device, equipment and storage medium | |
CN102647421A (en) | Web back door detection method and device based on behavioral characteristics | |
CN105656886A (en) | Method and device for detecting website attack behaviors based on machine learning | |
US20130086554A1 (en) | Analytics Driven Development | |
CN111159514B (en) | Method, device and equipment for detecting task effectiveness of web crawler and storage medium | |
CN110474900B (en) | Game protocol testing method and device | |
CN103297394A (en) | Website security detection method and device | |
JP6282217B2 (en) | Anti-malware system and anti-malware method | |
CN111600894A (en) | Network attack detection method and device | |
CN114036531A (en) | Multi-scale code measurement-based software security vulnerability detection method | |
CN107766224B (en) | Test method and test device | |
CN116136950A (en) | Chip verification method, device, system, electronic equipment and storage medium | |
CN108804501B (en) | Method and device for detecting effective information | |
CN112347326B (en) | Crawler detection method and device based on browser end | |
CN114844689A (en) | Website logic vulnerability detection method and system based on finite-state machine | |
KR101803225B1 (en) | System and Method for detecting malicious websites at high speed based multi-server, multi-docker | |
CN108881154A (en) | Webpage is tampered detection method, apparatus and system | |
Fathurrahmad et al. | Automatic Scanner Tools Analysis As A Website Penetration Testing: Automatic Scanner Tools Analysis As A Website Penetration Testing | |
CN110493254A (en) | Industrial Yunan County's overall evaluating method and device | |
CN113014601B (en) | Communication detection method, device, equipment and medium | |
CN112199573B (en) | Illegal transaction active detection method and system | |
CN115525528A (en) | Page quality detection method and device, electronic equipment and storage medium | |
CN113127284A (en) | Server pressure testing method and system, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |