[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113612639B - Method and device for analyzing and predicting file downloading behavior based on website access record - Google Patents

Method and device for analyzing and predicting file downloading behavior based on website access record Download PDF

Info

Publication number
CN113612639B
CN113612639B CN202110871515.9A CN202110871515A CN113612639B CN 113612639 B CN113612639 B CN 113612639B CN 202110871515 A CN202110871515 A CN 202110871515A CN 113612639 B CN113612639 B CN 113612639B
Authority
CN
China
Prior art keywords
access
file downloading
behavior
website
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110871515.9A
Other languages
Chinese (zh)
Other versions
CN113612639A (en
Inventor
翟欣虎
秦益飞
杨正权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Yianlian Network Technology Co ltd
Original Assignee
Jiangsu Yianlian Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Yianlian Network Technology Co ltd filed Critical Jiangsu Yianlian Network Technology Co ltd
Priority to CN202110871515.9A priority Critical patent/CN113612639B/en
Publication of CN113612639A publication Critical patent/CN113612639A/en
Application granted granted Critical
Publication of CN113612639B publication Critical patent/CN113612639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application provides a method for analyzing and predicting file downloading behaviors based on access records, which comprises the following steps: acquiring a website access record of at least one user accessing a target website; grouping the target website access records according to users to obtain personal access records corresponding to each user, and extracting a characteristic sequence before file downloading; grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting a non-file download characteristic sequence; and inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user. The method comprises the steps of analyzing website access records of users in a target website, extracting a characteristic sequence before file downloading and a non-file downloading characteristic sequence from the website access records, enabling a neural network model to learn a file downloading behavior pattern of the users, and training the neural network model to predict the occurrence probability of the file downloading behavior of the users in the target website.

Description

Method and device for analyzing and predicting file downloading behavior based on website access record
Technical Field
The application relates to the technical field of network security audit, in particular to a method and a device for analyzing and predicting file downloading behaviors based on website access records.
Background
With the increasing popularization of networks, novel network law violation and range behaviors for implementing crimes by using the networks are increasing day by day, and network security audit is to strengthen and standardize the prevention work of internet security technology, ensure internet network security and information security, make the internet out of deposit healthily and orderly and maintain national security, social order and public interests.
The method for detecting, analyzing and controlling the downloading behavior of the user and the files downloaded by the user are important parts in network security audit, generally, the most accurate data for recording the downloading behavior is the downloading record on the terminal equipment used by the user, but an operator cannot obtain the data on the terminal equipment used by the user through a simple method, so the most practical method is that after the user accesses an operator server, the user website access record generated by the server is analyzed to obtain the downloading behavior data of the user.
However, the existing TCP/IP protocol does not have a clear definition for the operation of downloading behavior, and the recording modes of the downloading behaviors of the application websites are not uniformly specified, so that it is difficult for an operator to determine whether the user has the downloading behavior during the user downloading audit.
In addition, currently, the file downloading behavior of the user identified through the website access record is determined according to the name of the requested resource in the website access record, for example, when the file suffix name in the name of the requested resource is a keyword such as doc, pdf, zip, rar, jpg, etc., the request is considered as the file downloading behavior, but the false alarm rate of the statistical method is very high, and the detected file downloading amount is much larger than the actual file downloading amount of the user.
For the above situation, a screening rule is further superimposed, for example, if the size of the fixed request resource exceeds a certain threshold, the file downloading behavior is considered, but the problem of high false alarm rate still exists. Because there is no criterion for determining the threshold for the size of the requested resource, even if a small resource is requested, it may be a file download activity, and a resource that exceeds the threshold may still not be a download activity.
Disclosure of Invention
In a first aspect, a method for analyzing and predicting file downloading behaviors based on website access records is provided, and according to the method, a website access record of a user in a website is analyzed, a characteristic sequence before file downloading and a non-file downloading characteristic sequence are extracted from the website access record, a neural network model is made to learn a file downloading behavior pattern of the user, and a neural network is trained to predict the occurrence probability of the file downloading behaviors of the user in the website.
Specifically, the method comprises the following steps:
acquiring a website access record of at least one user accessing a target website, wherein the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a file before-downloading characteristic sequence;
grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting continuous multiple access records from the time period access records which do not contain the file downloading behavior as non-file downloading characteristic sequences;
inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user.
The first neural network model comprises a recurrent neural network, the characteristic sequence before file downloading and the non-file downloading characteristic sequence are input into the recurrent neural network, the characteristic information and the sequence information of each node in the characteristic sequence before file downloading and the non-file downloading characteristic sequence before file downloading are recorded, and the characteristic information and the sequence information are converted into an information matrix.
A user may have a series of associated access actions before downloading a file, and therefore needs to analyze multiple consecutive website access records before the website access records of all file downloading activities. However, it is only single to predict the file downloading behavior according to the pre-file-download feature sequence and the non-pre-file-download feature sequence, so that more dimensional features need to be added to improve the accuracy of the final prediction.
Thus, the method further comprises:
extracting additional feature vectors corresponding to the access behaviors according to the personal access records, wherein the additional feature vectors comprise all-day distribution feature vectors, periodic feature vectors, type distribution feature vectors and adjacent feature vectors, and generating file downloading additional feature vectors and non-file downloading additional feature vectors;
inputting the pre-file-download feature sequence, the non-file-download feature sequence, the file-download additional feature vector and the non-file-download additional feature vector into a trained second neural network model, and predicting the probability of occurrence of file download behavior of the target website user.
The all-day distribution feature vector is the proportion of the access behaviors in all time periods in the all day in the total number of the behaviors; the periodic feature vector is a maximum time interval in which the access behavior periodically occurs; the type distribution feature vector is the proportion of the access behaviors in the total number of behaviors; the neighboring feature vector is the number of access behaviors.
In addition, in order to mark the access behavior of the website access record quickly, the method further comprises the following steps: carrying out access behavior marking on the website access record, wherein the access behavior marking at least comprises a file downloading behavior; and establishing a corresponding relation between the access behaviors and URL addresses, wherein one access behavior corresponds to one or more URL addresses. Specifically, according to the access behavior with the maximum similarity between the URL address in the website access record and the character string of the URL address in the corresponding relationship, the access behavior mark is performed on the website access record.
Wherein the second neural network model comprises the recurrent neural network, a convolutional neural network, a density layer connected to the recurrent neural network and the convolutional neural network;
and inputting the additional characteristics of the file downloading behaviors and the additional characteristics of the non-file downloading behaviors into the convolutional neural network for feature extraction, and fusing the output results of the convolutional neural network and the cyclic neural network by the density layer and predicting the probability of the next file downloading behaviors of the target website user.
In a second aspect, an embodiment of the present application is based on the same concept, and further provides a device for analyzing and predicting a file downloading behavior based on a website access record, where the device implements the method for analyzing and predicting a file downloading behavior based on a website access record, and the device includes:
an acquisition module: the system comprises a website access record, a network access record and a server, wherein the website access record is used for acquiring a website access record of at least one user accessing a target website, the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
a first extraction module: the system is used for grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a file pre-downloading characteristic sequence;
a second extraction module: the personal access records are grouped according to time periods to obtain time period access records corresponding to each time period, and a plurality of continuous access records are extracted from the time period access records which do not contain the file downloading behavior and serve as non-file downloading characteristic sequences;
a prediction module: and the characteristic sequence before file downloading and the non-file downloading characteristic sequence are input into a trained first neural network model, and the probability of occurrence of the file downloading behavior of the target website user is predicted.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores therein a computer program, and the processor is configured to execute the computer program to perform the method for predicting file downloading behavior based on website access record analysis as described above.
In a fourth aspect, an embodiment of the present application provides a computer program product, where the computer program product includes: a program or instructions which, when run on a computer, causes the computer to perform a method of predicting file download behaviour based on website visitation record analysis as described above.
In a fifth aspect, embodiments of the present application provide a readable storage medium, in which a computer program is stored, where the computer program includes program code for controlling a process to execute the process, where the process includes a method for predicting file download behavior based on website visitation record analysis as described in any of the above embodiments.
According to the method for analyzing and predicting the file downloading behavior based on the website access record, the website access record generated by the user in the target website is analyzed, the characteristic sequence before file downloading and the non-file downloading characteristic sequence are extracted from the website access record, the neural network is made to learn the file downloading behavior mode of the user, and the probability of the next file downloading behavior of the user in the target website of the first neural network model is trained. Particularly, since a user may have a series of associated access behaviors before downloading a file, in the embodiment of the present application, the recurrent neural network is used to learn the pre-file-download feature sequence and the non-file-download feature sequence, and learning of the non-linear features of the feature sequence is more advantageous because the recurrent neural network has memorability and parameter sharing.
It is worth mentioning that the method does not simply predict whether the downloading behavior occurs or not through the resource name and the resource size, and does not predict only according to file downloading forward progress of website access records as a feature sequence, but extracts an all-day distribution feature vector, a period feature vector, a type distribution feature vector and an adjacent feature vector as additional feature vectors according to website access records generated by users in a target website, and combines the behavior feature sequence and the additional feature vectors to train a second neural network model so as to improve the accuracy rate of predicting the next file downloading behavior of the users in the target website.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more concise and understandable description of the application, and features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flowchart of a method for predicting file download behavior based on website visitation record analysis according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a second neural network model in accordance with an embodiment of the present application;
FIG. 3 is a block diagram of an apparatus for analyzing and predicting file downloading behavior based on website access records according to an embodiment of the present application;
fig. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims that follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the methods may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Implement method
The embodiment provides a method for analyzing and predicting file downloading behaviors based on website access records, which comprises the steps of analyzing the website access records of users in a website, extracting a characteristic sequence before file downloading and a non-file downloading characteristic sequence from the website access records, enabling a neural network model to learn a file downloading behavior pattern of the users, and training the neural network model to predict the occurrence probability of the next file downloading behavior of the users in the website.
Referring to fig. 1, fig. 1 is a flowchart of a method for predicting file downloading behavior based on website visitation record analysis according to an embodiment of the present application.
As shown in fig. 1, the method comprises steps S1-S4:
step S1: the method comprises the steps of obtaining a website access record of at least one user accessing a target website, wherein the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior.
Usually, according to the needs of network security auditors, a certain website or a certain class of websites is taken as a target website, and website access records generated by all users in the target website for a long time are collected.
After the website access records are collected, the website access records can be filtered, and the additional records generated when the user accesses the website page in the target website are removed to obtain the actual website access records of the user in the target website. The specific filtering mode is as follows: and filtering out items of requests such as jpeg, png, ico, js, css and the like in the website access records, and removing some publicly-known and useless website access records.
In this step, in order to mark the access behavior of the website access record quickly, the preprocessing of the website access record can be realized by establishing the corresponding relationship between the URL and the behavior type. The method thus further comprises: carrying out access behavior marking on the website access record, wherein the access behavior marking at least comprises a file downloading behavior; and establishing a corresponding relation between the access behaviors and URL addresses, wherein one access behavior corresponds to one or more URL addresses. For example, the filtered website access records are respectively marked as a login system behavior, an access summary page behavior, an access details page behavior, a search behavior, a file download behavior, wherein the login system behavior corresponds to two different URL addresses, the access summary page behavior corresponds to three other different URL addresses, and so on, each behavior corresponding to one or more associated URL addresses. The fine-grained level of the behavioral marker characterization determines the accuracy and generalization ability of the recognition: when the qualitative is thicker, the recognition accuracy is relatively reduced, but the generalization ability is improved; conversely, when the qualitative granularity is finer, the recognition accuracy is improved, but the generalization ability is reduced. The generalization capability refers to the ability of the method to identify when used for access records that are not previously marked. The fine particle degree is determined according to actual needs, and a unified standard is not provided.
Specifically, according to the access behavior with the maximum similarity of the character strings of the URL address in the website access record and the URL address in the corresponding relationship, the access behavior mark is performed on the website access record. That is to say, the URL character string of the website access record and the URL address corresponding to each access behavior type are subjected to character string similarity calculation, all calculation results in each access behavior type are sorted from small to large by taking the average value, and the behavior type with the highest average value is selected as the behavior mark of the website access record.
In addition, if the number of URL addresses corresponding to each access behavior type is too large, a proportional sampling method may be adopted to extract a certain proportion, for example, 10% of URL addresses as samples participating in calculation, and then perform behavior marking on website access records according to contents according to the above method.
Step S2: grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a file before-downloading characteristic sequence.
In the step, the website access records of the website are grouped according to the unique identification of the user, for example, the website access records are grouped according to the user ID to obtain a personal access record corresponding to each user, then the personal access records of each user are arranged according to a time positive sequence, the website access records marked with the file downloading behaviors are screened out, a plurality of continuous website access records before the website access records marked with the file downloading behaviors are extracted as a feature sequence before file downloading, and the corresponding number of feature sequences before file downloading can be extracted according to the number of file downloading behaviors of the user.
And step S3: grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting continuous multiple access records from the time period access records which do not contain the file downloading behavior as non-file downloading characteristic sequences.
And (3) grouping the website access records in the personal access record obtained in the step (S2) according to time intervals to obtain time interval access records, for example, grouping every day to obtain time interval access records, and extracting continuous website access records from the time interval access records not containing file downloading behaviors as a non-file downloading characteristic sequence.
And step S4: inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user.
The first neural network model comprises a recurrent neural network, and the recurrent neural network is a recurrent neural network which takes sequence data as input, recurs in the evolution direction of the sequence and is connected with all nodes (recurrent units) in a chain manner. And the recurrent neural network has memorability, parameter sharing and complete graphic, so that the recurrent neural network has greater advantage in learning the nonlinear characteristics of the sequence. Inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a recurrent neural network, recording characteristic information and sequence information of each node in the characteristic sequence and the non-file downloading characteristic sequence, converting the characteristic information and the sequence information into an information matrix for further calculation at the downstream of a first neural network model, and finally obtaining the probability of occurrence of the file downloading behavior of the user of the predicted target website. The internal calculation process of the neural network does not have a common sense of describability, and thus the calculation process is not described in detail here.
A user may have a series of associated website visits before downloading a file, and therefore a recurrent neural network is used to perform learning analysis on a plurality of consecutive website visits before the website visits for all file downloading activities. However, it is single to predict the file downloading behavior only according to the pre-file downloading feature sequence and the non-pre-file downloading feature sequence, so more dimensional features need to be added to improve the accuracy of the final prediction.
In other embodiments, additional feature vectors corresponding to the access behaviors may be extracted according to the personal access records, where the additional feature vectors include day-wide distribution feature vectors, period feature vectors, type distribution feature vectors, and neighboring feature vectors, and file download additional feature vectors and non-file download additional feature vectors are generated;
inputting the pre-file-download feature sequence, the non-file-download feature sequence, the file-download additional feature vector and the non-file-download additional feature vector into a trained second neural network model, and predicting the probability of occurrence of file download behavior of the target website user.
The all-day distribution characteristic vector is the proportion of the access behaviors in all time periods in the all-day to the total number of the behaviors; the periodic feature vector is a maximum time interval in which the access behavior occurs periodically; the type distribution feature vector is the proportion of the access behaviors in the total number of behaviors; the neighboring feature vector is the number of access behaviors. Specifically, the all-day distributed feature vector is a feature vector generated by discretizing various access behaviors according to all-day time periods and counting the proportion of a certain behavior in each time period to the total times of all-day behaviors; the periodic feature vector is the maximum time interval of the periodic occurrence of a certain behavior; the type distribution characteristic vector is a characteristic vector which is used for counting the proportion of each access behavior in all behaviors and generating the length of the type of the non-repeated access behavior; the adjacent feature vectors are feature vectors which take the number of the access behavior types which are fully arranged as the length by counting the transition probability between the adjacent access behaviors.
Specific structure of the second neural network model referring to fig. 2, fig. 2 is a schematic structural diagram of the second neural network model according to the embodiment of the present application. As shown in fig. 2, the model includes a recurrent neural network, a convolutional neural network, and a density layer connected to the recurrent neural network and the convolutional neural network. Inputting a characteristic sequence before file downloading characteristics and a non-file downloading characteristic sequence into a recurrent neural network for characteristic extraction, wherein the recurrent neural network extracts abstract characteristics of the sequence of the characteristic sequences; inputting the file downloading additional characteristic vector and the non-file downloading additional characteristic vector into a convolutional neural network for characteristic extraction, wherein the convolutional neural network can better extract the characteristics without recording the sequence of input data; and then the density layer fuses output results of the cyclic neural network and the convolutional neural network and predicts the probability of the occurrence of the file downloading behavior of the website user.
And finally, comparing the occurrence probability of the file downloading behavior with a set occurrence threshold, and when the occurrence probability of the file downloading behavior is greater than the occurrence threshold, indicating that a user is about to perform the file downloading behavior in the target website.
The method for analyzing and predicting the file downloading behavior based on the website access record can be used for constructing a first neural network model and a second neural network model for a certain website and also for the same type of website, and can replace the characteristic sequence and the characteristic vector according to the thought of the method, and extract the required behavior characteristic sequence and behavior characteristic vector to train the corresponding neural network model.
Example two
Based on the same concept, referring to fig. 3, the present embodiment further provides an apparatus for analyzing and predicting a file downloading behavior based on a website access record, where the apparatus implements the method for analyzing and predicting a file downloading behavior based on a website access record, and the apparatus includes:
an acquisition module: the system comprises a website access record, a network access record and a server, wherein the website access record is used for acquiring a website access record of at least one user accessing a target website, the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
a first extraction module: the system is used for grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a file pre-downloading characteristic sequence;
a second extraction module: the personal access records are grouped according to time periods to obtain time period access records corresponding to each time period, and a plurality of continuous access records are extracted from the time period access records which do not contain the file downloading behavior and serve as non-file downloading characteristic sequences;
a prediction module: and the characteristic sequence before file downloading and the non-file downloading characteristic sequence are input into a trained first neural network model, and the probability of occurrence of the file downloading behavior of the target website user is predicted.
EXAMPLE III
The present embodiment further provides an electronic apparatus, specifically referring to fig. 4, including a memory 304 and a processor 302, where the memory 304 stores a computer program, and the processor 302 is configured to execute the computer program to perform the steps of any one of the methods for analyzing and predicting file downloading behavior based on website access records in the foregoing embodiments.
In particular, the processor 302 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 304 may include, among other things, mass storage 304 for data or instructions. By way of example, and not limitation, memory 304 may include a Hard Disk Drive (Hard Disk Drive, abbreviated HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical disc, a magneto-optical disc, tape, or a Universal Serial Bus (USB) Drive, or a combination of two or more of these. Memory 304 may include removable or non-removable (or fixed) media, where appropriate. The memory 304 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 304 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 304 includes Read-Only Memory (ROM) and Random Access Memory (RAM). Where appropriate, the ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically Alterable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory 304 (FPMDRAM), an Extended data output Dynamic Random-Access Memory (eddram), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 304 may be used to store or cache various initialization data files that need to be processed and/or used for communication, as well as possibly computer program instructions executed by the processor 302.
The processor 302, by reading and executing the computer program instructions stored in the memory 304, implements the method for predicting file downloading behavior based on website access record analysis in the above embodiment.
Optionally, the electronic apparatus may further include a transmission device 306 and an input/output device 308, where the transmission device 306 is connected to the processor 302, and the input/output device 308 is connected to the processor 302.
The transmitting device 306 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 306 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input/output device 308 is used for inputting or outputting information. For example, the input/output device may be a display screen, a mouse, a keyboard, or other devices. In this embodiment, the input device is used to input the acquired information, the input information may be data, tables, images, real-time videos, and the output information may be texts, charts, alarm information, etc. displayed by the service system.
Alternatively, in this embodiment, the processor 302 may be configured to execute the following steps by a computer program:
acquiring a website access record of at least one user accessing a target website, wherein the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a characteristic sequence before the file downloading;
grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting continuous multiple access records from the time period access records which do not contain the file downloading behavior as non-file downloading characteristic sequences;
inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user.
In addition, in combination with the method for analyzing and predicting file downloading behavior based on website access records in the foregoing embodiments, the embodiments of the present application may be implemented as a computer program product. The computer program product includes: a program or instructions which, when run on a computer, causes the computer to perform a method of implementing any of the above embodiments for predicting file download behavior based on website visitation record analysis.
In addition, in combination with the method for analyzing and predicting file downloading behavior based on website access records in the foregoing embodiments, the embodiments of the present application may provide a readable storage medium to implement. The readable storage medium has a computer program stored thereon; the computer program comprises program code for controlling a process to perform a process comprising any one of the above embodiments of the method for predicting file download behavior based on website visitation record analysis.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also called program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims (7)

1. The method for analyzing and predicting file downloading behaviors based on website access records comprises the following steps:
acquiring a website access record of at least one user accessing a target website, wherein the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
grouping the website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a characteristic sequence before the file downloading;
grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting a plurality of continuous time period access records from the time period access records not containing the file downloading behavior as non-file downloading characteristic sequences;
extracting additional feature vectors corresponding to the access behaviors according to the personal access records, wherein the additional feature vectors comprise all-day distribution feature vectors, periodic feature vectors, type distribution feature vectors and adjacent feature vectors, and generating file downloading additional feature vectors and non-file downloading additional feature vectors; inputting the pre-file-download characteristic sequence, the non-file-download characteristic sequence, the file-download additional characteristic vector and the non-file-download additional characteristic vector into a trained second neural network model, and predicting the occurrence probability of the target website user file download behavior;
the second neural network model comprises the recurrent neural network, a convolutional neural network, and a density layer connected to the recurrent neural network and the convolutional neural network; inputting a characteristic sequence before file downloading characteristics and a non-file downloading characteristic sequence into a recurrent neural network for characteristic extraction, wherein the recurrent neural network extracts abstract characteristics of the sequence of the characteristic sequences; and inputting the file downloading additional characteristic vector and the non-file downloading additional characteristic vector into a convolutional neural network for characteristic extraction, then fusing output results of the convolutional neural network and the convolutional neural network by a density layer, and predicting the probability of file downloading behavior of the website user.
2. The method for analyzing and predicting file downloading behavior based on website access records according to claim 1, wherein the all-day distribution feature vector is a proportion of the access behavior in each time period in the all-day to the total behavior; the periodic feature vector is a maximum time interval in which the access behavior periodically occurs; the type distribution feature vector is the proportion of the access behaviors in the total number of behaviors; the neighboring feature vector is the number of access behaviors.
3. The method for analyzing and predicting file download behavior based on website visitation record as claimed in claim 1, further comprising: carrying out access behavior marking on the website access record, wherein the access behavior marking at least comprises a file downloading behavior; and establishing a corresponding relation between the access behaviors and URL addresses, wherein the access behaviors correspond to one or more URL addresses.
4. The method for analyzing and predicting file downloading behavior based on website access records according to claim 3, wherein the website access records are marked with the access behavior according to the access behavior with the maximum similarity of character strings between the URL address in the website access records and the URL address in the corresponding relationship.
5. Device based on website access record analysis prediction file download action, characterized in that includes:
an acquisition module: the system comprises a website access record, a network access record and a server, wherein the website access record is used for acquiring a website access record of at least one user accessing a target website, the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
a first extraction module: the system is used for grouping the website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a characteristic sequence before the file downloading;
a second extraction module: the personal access records are grouped according to time periods to obtain time period access records corresponding to each time period, and a plurality of continuous time period access records are extracted from the time period access records which do not contain the file downloading behavior and serve as non-file downloading characteristic sequences; extracting additional characteristic vectors corresponding to the access behaviors according to the personal access records, wherein the additional characteristic vectors comprise all-day distribution characteristic vectors, cycle characteristic vectors, type distribution characteristic vectors and adjacent characteristic vectors, and generating file downloading additional characteristic vectors and non-file downloading additional characteristic vectors;
a prediction module: inputting the pre-file-download characteristic sequence, the non-file-download characteristic sequence, the file-download additional characteristic vector and the non-file-download additional characteristic vector into a trained second neural network model, and predicting the occurrence probability of the target website user file download behavior;
the second neural network model comprises the recurrent neural network, a convolutional neural network, and a density layer connected to the recurrent neural network and the convolutional neural network; inputting a characteristic sequence before file downloading characteristics and a non-file downloading characteristic sequence into a recurrent neural network for characteristic extraction, wherein the recurrent neural network extracts abstract characteristics of the sequence of the characteristic sequences; and inputting the file downloading additional characteristic vector and the non-file downloading additional characteristic vector into a convolutional neural network for characteristic extraction, then fusing output results of the convolutional neural network and the cyclic neural network by a density layer, and predicting the occurrence probability of file downloading behaviors of the website user.
6. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for analyzing and predicting file download behavior based on website visitation record according to any one of claims 1-4.
7. A readable storage medium having stored thereon a computer program comprising program code for controlling a process to execute a process, the process comprising the method of predicting file download behavior based on website visitation record analysis according to any of claims 1-4.
CN202110871515.9A 2021-07-30 2021-07-30 Method and device for analyzing and predicting file downloading behavior based on website access record Active CN113612639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110871515.9A CN113612639B (en) 2021-07-30 2021-07-30 Method and device for analyzing and predicting file downloading behavior based on website access record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110871515.9A CN113612639B (en) 2021-07-30 2021-07-30 Method and device for analyzing and predicting file downloading behavior based on website access record

Publications (2)

Publication Number Publication Date
CN113612639A CN113612639A (en) 2021-11-05
CN113612639B true CN113612639B (en) 2022-11-11

Family

ID=78306247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110871515.9A Active CN113612639B (en) 2021-07-30 2021-07-30 Method and device for analyzing and predicting file downloading behavior based on website access record

Country Status (1)

Country Link
CN (1) CN113612639B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423442B (en) * 2017-08-07 2020-09-25 火烈鸟网络(广州)股份有限公司 Application recommendation method and system based on user portrait behavior analysis, storage medium and computer equipment
CN109902849B (en) * 2018-06-20 2021-11-30 华为技术有限公司 User behavior prediction method and device, and behavior prediction model training method and device
CN111798259A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Application recommendation method and device, storage medium and electronic equipment
CN111797978A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Internal threat detection method and device, electronic equipment and storage medium
CN112801719A (en) * 2021-03-01 2021-05-14 深圳市欢太科技有限公司 User behavior prediction method, user behavior prediction device, storage medium, and apparatus

Also Published As

Publication number Publication date
CN113612639A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
TWI706273B (en) Uniform resource locator (URL) attack detection method, device and electronic equipment
Ali Alheeti et al. Intelligent intrusion detection in external communication systems for autonomous vehicles
CN111291264B (en) Access object prediction method and device based on machine learning and computer equipment
CN107862022B (en) Culture resource recommendation system
CN107797894B (en) APP user behavior analysis method and device
CN112434208A (en) Training of isolated forest and identification method and related device of web crawler of isolated forest
CN107305611B (en) Method and device for establishing model corresponding to malicious account and method and device for identifying malicious account
CN103918222A (en) System and method for detection of denial of service attacks
CN105183873A (en) Malicious clicking behavior detection method and device
CN110516173B (en) Illegal network station identification method, illegal network station identification device, illegal network station identification equipment and illegal network station identification medium
CN103631787A (en) Webpage type recognition method and webpage type recognition device
CN108366012B (en) Social relationship establishing method and device and electronic equipment
CN109525551A (en) A method of the CC based on statistical machine learning attacks protection
US10346856B1 (en) Personality aggregation and web browsing
EP3705974A1 (en) Classification device, classification method, and classification program
CN104731937B (en) The processing method and processing device of user behavior data
CN113205134A (en) Network security situation prediction method and system
JP7304223B2 (en) Methods and systems for generating hybrid learning techniques
Liu et al. Enhancing fine-grained intra-urban dengue forecasting by integrating spatial interactions of human movements between urban regions
CN117593096B (en) Intelligent pushing method and device for product information, electronic equipment and computer medium
CN113612639B (en) Method and device for analyzing and predicting file downloading behavior based on website access record
CN117294873A (en) Abnormal media resource detection method and device, storage medium and electronic equipment
Domingues et al. On the Analysis of Users' Behavior Based on Mobile Phone Apps
CN116738369A (en) Traffic data classification method, device, equipment and storage medium
CN113254672B (en) Method, system, equipment and readable storage medium for identifying abnormal account

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant