[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110381094A - A kind of method and system of user portrait and behavioural analysis based on DPI technology - Google Patents

A kind of method and system of user portrait and behavioural analysis based on DPI technology Download PDF

Info

Publication number
CN110381094A
CN110381094A CN201910855391.8A CN201910855391A CN110381094A CN 110381094 A CN110381094 A CN 110381094A CN 201910855391 A CN201910855391 A CN 201910855391A CN 110381094 A CN110381094 A CN 110381094A
Authority
CN
China
Prior art keywords
message
data
technology
buffer area
protocol label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910855391.8A
Other languages
Chinese (zh)
Inventor
刘慰慰
杨昆
阎星娥
严荣明
张�林
魏红道
江汀
刘皓峥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Fly Data Technology Co Ltd
Original Assignee
Nanjing Fly Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Fly Data Technology Co Ltd filed Critical Nanjing Fly Data Technology Co Ltd
Priority to CN201910855391.8A priority Critical patent/CN110381094A/en
Publication of CN110381094A publication Critical patent/CN110381094A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer And Data Communications (AREA)

Abstract

The present invention relates to technical field of data processing, disclose a kind of method and system of user portrait and behavioural analysis based on DPI technology, solve the problems, such as that user's portrait and behavioural analysis are inaccurate, its key points of the technical solution are that after first terminal setting data protocol label and network interface card title, initialization data acquisition module, message analysis module and data memory module, then data acquisition module intercepts message, when the interception message and the success of data protocol tag match, copy the message content, data protocol label is set for the message and is added to corresponding data buffer area.Message analysis module carries out discriminance analysis to the message using DPI technology, obtain useful information, data memory module circulation takes out useful information from data buffer area, and the useful information is submitted in database, finally can user behavior in data base querying first terminal and real-time flow table information, so as to accurately identify user demand.

Description

A kind of method and system of user portrait and behavioural analysis based on DPI technology
Technical field
This disclosure relates to technical field of data processing, more particularly to a kind of user's portrait and behavior point based on DPI technology The method and system of analysis.
Background technique
Internet under information technology driving has been developed as indispensable in current social people's daily life A part, the information content on network becomes more and more also with the development of network, so that people can not be from a large amount of data It is middle to obtain the information useful to oneself.In this case, by user behavior analysis, the interested information of user is obtained, and Successively content is targetedly provided data, and more accurate service can be provided for user.In addition in school, hotel, machine Many public places such as field can all provide open network access mode, how more reasonably utilize Internet resources, Yi Jibao The safety for demonstrate,proving these networks, exempting to be illegally used just is particularly important.
Seldom in the quantity of the time earlier of internet development, service, application communication all uses fixed port numbers, so It only needs to identify almost all of network flow according to port.With the development of internet, number of network users And various panoramic network services are also continuously increased therewith, more and more network services are no longer dependent on fixed end Slogan, this variation increase the difficulty of identification network flow, how accurately to identify the problem of user demand is urgent need to resolve.
Summary of the invention
The method and system of the user portrait and behavioural analysis that present disclose provides a kind of based on DPI technology, reach accurate Identify the technical purpose of user demand.
The above-mentioned technical purpose of the disclosure has the technical scheme that
A method of user's portrait and behavioural analysis based on DPI technology, comprising:
Setting data protocol label and network interface card title, the data protocol label are corresponding with the first data buffer area;
Message is intercepted from network interface card using zero duplication technology after initialization, by the message and data protocol label progress Match;
If it fails to match, next message is intercepted again;
If successful match, the message content is copied, the data protocol label is set for the message, and the message is delayed It deposits to first data buffer area;
The message is identified and analyzed using DPI technology, obtains the useful information of the message, by the useful information It caches to the second data buffer area;
The useful information is taken out from second data buffer area, and the useful information is spliced into SQL statement and is submitted Into database;
According to the database query user behavior and real-time flow table information.
Further, the process of the initialization includes:
Load the data protocol label configuration;
According to the network interface card title Initial message capturing function and start message capturing thread;
Dynamically load message analysis dynamic base, the message analysis dynamic base include at least one message analysis module, are each The message analysis module binds corresponding first data buffer area and the data protocol label, then starts the report The processing thread of literary analysis module;
Create database connection pool, starting timing storage thread.
Further, the five-tuple information for obtaining the message, by the five-tuple information and the data protocol label It is matched.
Further, the DPI technology includes that tagged word identification technology, application layer gateway identification technology and behavior pattern are known Other technology uses the combination of selection any one of them or any one technology identified above when the DPI technology.
A kind of system of user portrait and behavioural analysis based on DPI technology, comprising:
First terminal, is arranged data protocol label and network interface card title, the data protocol label are corresponding with the first data buffer area;
Data acquisition module intercepts message using zero duplication technology after initialization from network interface card;
The data acquisition module includes matching unit, and the matching unit carries out the message and the data protocol label Matching, if it fails to match, intercepts next message again;If successful match, the message content is copied, is set for the message Set the data protocol label, and by the packet buffer to first data buffer area;
Message analysis dynamic base, including first data buffer area and at least one message analysis module, the message analysis Module is identified and analyzed the message using DPI technology, obtains the useful information of the message, by the useful information It caches to the second data buffer area;
Data memory module, including database and second data buffer area, take out institute from second data buffer area Useful information is stated, and the useful information is spliced into SQL statement and is submitted in the database;
Web query module, user behavior and real-time flow table information according to first terminal described in the data base querying.
Further, the Web query module includes:
Background query service unit provides interface for the data base querying;
The user behavior information of the first terminal is inquired and analyzed to foreground queries unit.
Further, using asynchronous system the database is written in the useful information by the data memory module.
Further, the DPI technology includes that tagged word identification technology, application layer gateway identification technology and behavior pattern are known Other technology uses the combination of selection any one of them or any one technology identified above when the DPI technology.
Further, the five-tuple information for obtaining the message, by the five-tuple information and the data protocol label It is matched.
In conclusion the beneficial effect of the disclosure is: after data protocol label and network interface card title is arranged in first terminal, just Beginningization data acquisition module, message analysis module and data memory module, then data acquisition module is cut by zero duplication technology Message is taken, when the interception message and the success of data protocol tag match, copies the message content, data are set for the message and are assisted It assesses a bid for tender and signs and be added to corresponding first data buffer area.Message analysis module knows the message using DPI technology It not and analyzes, obtains useful information, data memory module circulation takes out useful information from the second data buffer area, and this is had SQL statement is spliced into information to be submitted in database, it may finally be in the user behavior and reality of data base querying first terminal When flow table information, so as to accurately identify user demand.
Detailed description of the invention
Fig. 1 is method of disclosure flow chart;
Fig. 2 is disclosure system schematic;
Fig. 3 is disclosure initialization procedure flow chart;
Fig. 4 is data acquisition module work flow diagram;
Fig. 5 is HTTP message analysis module work flow diagram;
Fig. 6 is data memory module flow chart.
Specific embodiment
The disclosure is described in further detail below in conjunction with attached drawing.
It will be understood that term " first ", " second " are used for description purposes only, and cannot in the description of the disclosure It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Message described in the disclosure The first data buffer area storage of analysis module is original message data, the second data buffer area storage of data memory module It is the useful content information of successfully resolved.
Fig. 1 is method of disclosure flow chart, and Fig. 2 is disclosure system schematic, and disclosure system includes first terminal, number According to acquisition module, message analysis dynamic base, data memory module and Web query module, wherein data acquisition module includes matching Unit, message analysis dynamic base include the first data buffer area and at least one message analysis module, and data memory module includes Database and the second data buffer area.Web query module includes background query service unit and foreground queries unit, background query Service unit provides interface for data base querying;The user behavior information of the inquiry of foreground queries unit and analysis first terminal.
First terminal setting data protocol label and network interface card title, data protocol label are corresponding with the first data buffer area. Data acquisition module intercepts message using Zero-copy mode from network interface card, and matching unit carries out message and data protocol label Matching, it fails to match then intercepts next message again;If successful match, message content is copied, data protocol is set for message Label, and message is added to the first data buffer area.Message analysis module is using DPI technology to being buffered in the first data buffer storage The message in area is identified and analyzed, and obtains the useful information of message, then caches useful information to the second data buffer area; Data memory module then takes out useful information from the second data buffer area, and useful information is spliced into SQL statement and is submitted to In database;User behavior and real-time flow table information of the Web query module according to data base querying first terminal.
After intercepting message, the five-tuple information of message can be first obtained, then again by the five-tuple information and date of message Protocol label is matched.In addition, data memory module is the extraction useful information from the second data buffer area of circulation.
Before carrying out data acquisition and message analysis, first system is initialized, Fig. 3 is initialization procedure process Figure.When being initialized to data acquisition module, load data protocol label configuration first, then according to first terminal setting Network interface card title, Initial message capturing function simultaneously start message capturing thread, realize high-performance packet capturing by zero duplication technology.
The configuration of data protocol label is loaded, exactly five-tuple relationship and protocol type mapping relations are read into memory, in this way Data acquisition module can quickly learn whether a certain message needs to capture, and by the message for needing to capture plus data protocol mark Label.Five-tuple and the configuration of data protocol tag types mapping relations are as shown in table 1:
Protocol number Filtration types Protocol type IP Mask Port
1 0x01 0x11 0 0 80
1 0x01 0x11 0 0 8080
2 0x01 0x11 0 0 53
4 0x01 0x11 0 0 443
Table 1
Message analysis module is initialized, it is all first in dynamically load specified directoryMessage analysis module.Each report Literary analysis module and base class unified interface, can load the message analysis module.Interface class is defined as follows:
class ParserBase
{
public :
ParserBase(){}
virtual ~ParserBase(){}
virtual int32_t Init(int32_t ithreadnum, QueueInfo *que){};
virtual void Run() = 0;
virtual int32_t GetTagId() = 0;
};
typedef ParserBase *create_t();
typedef void destory_t(ParserBase * base);
Initialization to data memory module, it will create one with database and connect, be saved in database for analyzing result In, starting timing storage thread uses for Web query module.After initialization is fully completed, data acquisition module brings into operation, Its workflow is for example as shown in Figure 4.One original message is obtained by Zero-copy mode first from network interface card, then analytic message Header information, from data link layer, network layer to transport layer, layer-by-layer analytic message header information obtains five-tuple information and answers With layer data initial position.It is matched with the five-tuple information being resolved to five-tuple protocol type mapping table, if It fails to match, illustrates that the data message does not need to analyze, otherwise copies message, stamps data protocol label, data for the message Protocol label can be multiple.
Finally by the data protocol label stamped, message is distributed in corresponding first data buffer area wait locate Reason.
After the completion of data acquisition, message analysis module is identified and analyzed using message of the DPI technology to capture, DPI Technology includes tagged word identification technology, application layer gateway identification technology and behavior pattern recognition technology, and the disclosure uses DPI technology When selection any one of them or any one technology identified above combination.Different protocal analysis methods are slightly different, here It is described in detail by taking http protocol analysis module as an example, as shown in Figure 5.
A message is taken out from corresponding first data buffer area of http protocol, obtains http protocol head content.Such as Fruit is successfully searched http protocol head feature, then is carried out using high-performance canonical matching library to key content by characteristic character string It extracts, characteristic character string includes GET, POST, PUT, Host and User-Agent etc., and the key content extracted includes Host, URL With end message etc..After contents extraction success, content (i.e. useful information) will be extracted and be put into the second data buffer zone, number is waited According to being saved in database.
Fig. 6 is data memory module work flow diagram, by taking http protocol as an example, process are as follows: from the second data buffer area In take a data, Host is inserted into http_host table, if Host is repeated just to update access time or Host and not repeated, Host is then corresponded to ID, URL and User-Agent to be inserted into http_parse table;When updating access if URL occurs and repeats Between or URL do not repeat, then by former IP and Host deposit user_visit_host table in.
To sum up, first terminal described in the disclosure includes virtual and non-virtual terminal, can be realized disclosure the method , belong to the first terminal of disclosure system.
The above are disclosure exemplary embodiment, the protection scope of the disclosure is limited by claims and its equivalent.

Claims (9)

1. a kind of method of user portrait and behavioural analysis based on DPI technology characterized by comprising
Setting data protocol label and network interface card title, the data protocol label are corresponding with the first data buffer area;
Message is intercepted from network interface card using zero duplication technology after initialization, by the message and data protocol label progress Match;
If it fails to match, next message is intercepted again;
If successful match, the message content is copied, the data protocol label is set for the message, and the message is delayed It deposits to first data buffer area;
The message is identified and analyzed using DPI technology, obtains the useful information of the message, by the useful information It caches to the second data buffer area;
The useful information is taken out from second data buffer area, and the useful information is spliced into SQL statement and is submitted Into database;
According to the database query user behavior and real-time flow table information.
2. the method for user portrait and behavioural analysis based on DPI technology as described in claim 1, which is characterized in that described The process of initialization includes:
Load the data protocol label configuration;
According to the network interface card title Initial message capturing function and start message capturing thread;
Dynamically load message analysis dynamic base, the message analysis dynamic base include at least one message analysis module, are each The message analysis module binds corresponding first data buffer area and the data protocol label, then starts the report The processing thread of literary analysis module;
Create database connection pool, starting timing storage thread.
3. the method for user portrait and behavioural analysis based on DPI technology as claimed in claim 1 or 2, which is characterized in that obtain The five-tuple information for taking the message matches the five-tuple information with the data protocol label.
4. the method for user portrait and behavioural analysis based on DPI technology as claimed in claim 3, which is characterized in that described DPI technology includes tagged word identification technology, application layer gateway identification technology and behavior pattern recognition technology, uses the DPI skill Any one of them or the combination of any one technology identified above are selected when art.
5. a kind of system of user portrait and behavioural analysis based on DPI technology characterized by comprising
First terminal, is arranged data protocol label and network interface card title, the data protocol label are corresponding with the first data buffer area;
Data acquisition module intercepts message using zero duplication technology after initialization from network interface card;
The data acquisition module includes matching unit, and the matching unit carries out the message and the data protocol label Matching, if it fails to match, intercepts next message again;If successful match, the message content is copied, is set for the message Set the data protocol label, and by the packet buffer to first data buffer area;
Message analysis dynamic base, including first data buffer area and at least one message analysis module, the message analysis Module is identified and analyzed the message using DPI technology, obtains the useful information of the message, by the useful information It caches to the second data buffer area;
Data memory module, including database and second data buffer area, take out institute from second data buffer area Useful information is stated, and the useful information is spliced into SQL statement and is submitted in the database;
Web query module, user behavior and real-time flow table information according to first terminal described in the data base querying.
6. the system of user portrait and behavioural analysis based on DPI technology as claimed in claim 5, which is characterized in that described Web query module includes:
Background query service unit provides interface for the data base querying;
The user behavior information of the first terminal is inquired and analyzed to foreground queries unit.
7. such as the system of the user portrait and behavioural analysis described in claim 5 or 6 based on DPI technology, which is characterized in that institute It states data memory module and the useful information is written by the database using asynchronous system.
8. the system of user portrait and behavioural analysis based on DPI technology as claimed in claim 7, which is characterized in that described DPI technology includes tagged word identification technology, application layer gateway identification technology and behavior pattern recognition technology, uses the DPI skill Any one of them or the combination of any one technology identified above are selected when art.
9. the system of user portrait and behavioural analysis based on DPI technology as claimed in claim 8, which is characterized in that obtain The five-tuple information of the message matches the five-tuple information with the data protocol label.
CN201910855391.8A 2019-09-11 2019-09-11 A kind of method and system of user portrait and behavioural analysis based on DPI technology Pending CN110381094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910855391.8A CN110381094A (en) 2019-09-11 2019-09-11 A kind of method and system of user portrait and behavioural analysis based on DPI technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910855391.8A CN110381094A (en) 2019-09-11 2019-09-11 A kind of method and system of user portrait and behavioural analysis based on DPI technology

Publications (1)

Publication Number Publication Date
CN110381094A true CN110381094A (en) 2019-10-25

Family

ID=68261395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910855391.8A Pending CN110381094A (en) 2019-09-11 2019-09-11 A kind of method and system of user portrait and behavioural analysis based on DPI technology

Country Status (1)

Country Link
CN (1) CN110381094A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177595A (en) * 2019-12-20 2020-05-19 杭州九略智能科技有限公司 Method for extracting asset information in template mode aiming at HTTP (hyper text transport protocol)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105491018A (en) * 2015-11-24 2016-04-13 北京中电普华信息技术有限公司 System and method for network data security analysis based on DPI technology
CN106982150A (en) * 2017-03-27 2017-07-25 重庆邮电大学 A kind of mobile Internet user behavior analysis method based on Hadoop
CN107040405A (en) * 2017-03-13 2017-08-11 中国人民解放军信息工程大学 Passive type various dimensions main frame Fingerprint Model construction method and its device under network environment
US20170289283A1 (en) * 2016-04-01 2017-10-05 App Annie Inc. Automated dpi process

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105491018A (en) * 2015-11-24 2016-04-13 北京中电普华信息技术有限公司 System and method for network data security analysis based on DPI technology
US20170289283A1 (en) * 2016-04-01 2017-10-05 App Annie Inc. Automated dpi process
CN107040405A (en) * 2017-03-13 2017-08-11 中国人民解放军信息工程大学 Passive type various dimensions main frame Fingerprint Model construction method and its device under network environment
CN106982150A (en) * 2017-03-27 2017-07-25 重庆邮电大学 A kind of mobile Internet user behavior analysis method based on Hadoop

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177595A (en) * 2019-12-20 2020-05-19 杭州九略智能科技有限公司 Method for extracting asset information in template mode aiming at HTTP (hyper text transport protocol)
CN111177595B (en) * 2019-12-20 2024-04-05 杭州九略智能科技有限公司 Method for extracting asset information by templating HTTP protocol

Similar Documents

Publication Publication Date Title
US6694307B2 (en) System for collecting specific information from several sources of unstructured digitized data
US6996798B2 (en) Automatically deriving an application specification from a web-based application
CN102394885B (en) Information classification protection automatic verification method based on data stream
US20090319515A1 (en) System and method for managing entity knowledgebases
US20120215765A1 (en) Systems and Methods for Generating Statistics from Search Engine Query Logs
CN103248677B (en) The Internet behavioural analysis system and method for work thereof
EP1869583A1 (en) Content adaptation
US20020143808A1 (en) Intelligent document linking system
US7032017B2 (en) Identifying unique web visitors behind proxy servers
CN101739453A (en) Method and device for carrying out condition query on database table
CN102098331A (en) Method and system for reducing WEB type application contents
CN109948343A (en) Leak detection method, Hole Detection device and computer readable storage medium
CN105763543A (en) Phishing site identification method and device
WO2017185912A1 (en) Method and apparatus for collecting statistics about terminal device information based on hash node
CN103530429A (en) Webpage content extracting method
CN103729479A (en) Web page content statistical method and system based on distributed file storage
CN112434224A (en) Tax preferential policy recommendation method and system based on knowledge graph
US20050055335A1 (en) Search system and method
CN109857923A (en) A kind of news intelligent recommendation method and system based on area media
CN110381094A (en) A kind of method and system of user portrait and behavioural analysis based on DPI technology
JP2022105474A (en) Method for verifying vulnerabilities of network devices using cve entries
CN111556039B (en) Web data export method and device for general microservice
CN112003884B (en) Method for collecting network assets and retrieving natural language
CN106982147B (en) Communication monitoring method and device for Web communication application
CN111798351A (en) Data processing method and device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191025