CN110381094A - A kind of method and system of user portrait and behavioural analysis based on DPI technology - Google Patents
A kind of method and system of user portrait and behavioural analysis based on DPI technology Download PDFInfo
- Publication number
- CN110381094A CN110381094A CN201910855391.8A CN201910855391A CN110381094A CN 110381094 A CN110381094 A CN 110381094A CN 201910855391 A CN201910855391 A CN 201910855391A CN 110381094 A CN110381094 A CN 110381094A
- Authority
- CN
- China
- Prior art keywords
- message
- data
- technology
- buffer area
- protocol label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Computer Security & Cryptography (AREA)
- Computer And Data Communications (AREA)
Abstract
The present invention relates to technical field of data processing, disclose a kind of method and system of user portrait and behavioural analysis based on DPI technology, solve the problems, such as that user's portrait and behavioural analysis are inaccurate, its key points of the technical solution are that after first terminal setting data protocol label and network interface card title, initialization data acquisition module, message analysis module and data memory module, then data acquisition module intercepts message, when the interception message and the success of data protocol tag match, copy the message content, data protocol label is set for the message and is added to corresponding data buffer area.Message analysis module carries out discriminance analysis to the message using DPI technology, obtain useful information, data memory module circulation takes out useful information from data buffer area, and the useful information is submitted in database, finally can user behavior in data base querying first terminal and real-time flow table information, so as to accurately identify user demand.
Description
Technical field
This disclosure relates to technical field of data processing, more particularly to a kind of user's portrait and behavior point based on DPI technology
The method and system of analysis.
Background technique
Internet under information technology driving has been developed as indispensable in current social people's daily life
A part, the information content on network becomes more and more also with the development of network, so that people can not be from a large amount of data
It is middle to obtain the information useful to oneself.In this case, by user behavior analysis, the interested information of user is obtained, and
Successively content is targetedly provided data, and more accurate service can be provided for user.In addition in school, hotel, machine
Many public places such as field can all provide open network access mode, how more reasonably utilize Internet resources, Yi Jibao
The safety for demonstrate,proving these networks, exempting to be illegally used just is particularly important.
Seldom in the quantity of the time earlier of internet development, service, application communication all uses fixed port numbers, so
It only needs to identify almost all of network flow according to port.With the development of internet, number of network users
And various panoramic network services are also continuously increased therewith, more and more network services are no longer dependent on fixed end
Slogan, this variation increase the difficulty of identification network flow, how accurately to identify the problem of user demand is urgent need to resolve.
Summary of the invention
The method and system of the user portrait and behavioural analysis that present disclose provides a kind of based on DPI technology, reach accurate
Identify the technical purpose of user demand.
The above-mentioned technical purpose of the disclosure has the technical scheme that
A method of user's portrait and behavioural analysis based on DPI technology, comprising:
Setting data protocol label and network interface card title, the data protocol label are corresponding with the first data buffer area;
Message is intercepted from network interface card using zero duplication technology after initialization, by the message and data protocol label progress
Match;
If it fails to match, next message is intercepted again;
If successful match, the message content is copied, the data protocol label is set for the message, and the message is delayed
It deposits to first data buffer area;
The message is identified and analyzed using DPI technology, obtains the useful information of the message, by the useful information
It caches to the second data buffer area;
The useful information is taken out from second data buffer area, and the useful information is spliced into SQL statement and is submitted
Into database;
According to the database query user behavior and real-time flow table information.
Further, the process of the initialization includes:
Load the data protocol label configuration;
According to the network interface card title Initial message capturing function and start message capturing thread;
Dynamically load message analysis dynamic base, the message analysis dynamic base include at least one message analysis module, are each
The message analysis module binds corresponding first data buffer area and the data protocol label, then starts the report
The processing thread of literary analysis module;
Create database connection pool, starting timing storage thread.
Further, the five-tuple information for obtaining the message, by the five-tuple information and the data protocol label
It is matched.
Further, the DPI technology includes that tagged word identification technology, application layer gateway identification technology and behavior pattern are known
Other technology uses the combination of selection any one of them or any one technology identified above when the DPI technology.
A kind of system of user portrait and behavioural analysis based on DPI technology, comprising:
First terminal, is arranged data protocol label and network interface card title, the data protocol label are corresponding with the first data buffer area;
Data acquisition module intercepts message using zero duplication technology after initialization from network interface card;
The data acquisition module includes matching unit, and the matching unit carries out the message and the data protocol label
Matching, if it fails to match, intercepts next message again;If successful match, the message content is copied, is set for the message
Set the data protocol label, and by the packet buffer to first data buffer area;
Message analysis dynamic base, including first data buffer area and at least one message analysis module, the message analysis
Module is identified and analyzed the message using DPI technology, obtains the useful information of the message, by the useful information
It caches to the second data buffer area;
Data memory module, including database and second data buffer area, take out institute from second data buffer area
Useful information is stated, and the useful information is spliced into SQL statement and is submitted in the database;
Web query module, user behavior and real-time flow table information according to first terminal described in the data base querying.
Further, the Web query module includes:
Background query service unit provides interface for the data base querying;
The user behavior information of the first terminal is inquired and analyzed to foreground queries unit.
Further, using asynchronous system the database is written in the useful information by the data memory module.
Further, the DPI technology includes that tagged word identification technology, application layer gateway identification technology and behavior pattern are known
Other technology uses the combination of selection any one of them or any one technology identified above when the DPI technology.
Further, the five-tuple information for obtaining the message, by the five-tuple information and the data protocol label
It is matched.
In conclusion the beneficial effect of the disclosure is: after data protocol label and network interface card title is arranged in first terminal, just
Beginningization data acquisition module, message analysis module and data memory module, then data acquisition module is cut by zero duplication technology
Message is taken, when the interception message and the success of data protocol tag match, copies the message content, data are set for the message and are assisted
It assesses a bid for tender and signs and be added to corresponding first data buffer area.Message analysis module knows the message using DPI technology
It not and analyzes, obtains useful information, data memory module circulation takes out useful information from the second data buffer area, and this is had
SQL statement is spliced into information to be submitted in database, it may finally be in the user behavior and reality of data base querying first terminal
When flow table information, so as to accurately identify user demand.
Detailed description of the invention
Fig. 1 is method of disclosure flow chart;
Fig. 2 is disclosure system schematic;
Fig. 3 is disclosure initialization procedure flow chart;
Fig. 4 is data acquisition module work flow diagram;
Fig. 5 is HTTP message analysis module work flow diagram;
Fig. 6 is data memory module flow chart.
Specific embodiment
The disclosure is described in further detail below in conjunction with attached drawing.
It will be understood that term " first ", " second " are used for description purposes only, and cannot in the description of the disclosure
It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Message described in the disclosure
The first data buffer area storage of analysis module is original message data, the second data buffer area storage of data memory module
It is the useful content information of successfully resolved.
Fig. 1 is method of disclosure flow chart, and Fig. 2 is disclosure system schematic, and disclosure system includes first terminal, number
According to acquisition module, message analysis dynamic base, data memory module and Web query module, wherein data acquisition module includes matching
Unit, message analysis dynamic base include the first data buffer area and at least one message analysis module, and data memory module includes
Database and the second data buffer area.Web query module includes background query service unit and foreground queries unit, background query
Service unit provides interface for data base querying;The user behavior information of the inquiry of foreground queries unit and analysis first terminal.
First terminal setting data protocol label and network interface card title, data protocol label are corresponding with the first data buffer area.
Data acquisition module intercepts message using Zero-copy mode from network interface card, and matching unit carries out message and data protocol label
Matching, it fails to match then intercepts next message again;If successful match, message content is copied, data protocol is set for message
Label, and message is added to the first data buffer area.Message analysis module is using DPI technology to being buffered in the first data buffer storage
The message in area is identified and analyzed, and obtains the useful information of message, then caches useful information to the second data buffer area;
Data memory module then takes out useful information from the second data buffer area, and useful information is spliced into SQL statement and is submitted to
In database;User behavior and real-time flow table information of the Web query module according to data base querying first terminal.
After intercepting message, the five-tuple information of message can be first obtained, then again by the five-tuple information and date of message
Protocol label is matched.In addition, data memory module is the extraction useful information from the second data buffer area of circulation.
Before carrying out data acquisition and message analysis, first system is initialized, Fig. 3 is initialization procedure process
Figure.When being initialized to data acquisition module, load data protocol label configuration first, then according to first terminal setting
Network interface card title, Initial message capturing function simultaneously start message capturing thread, realize high-performance packet capturing by zero duplication technology.
The configuration of data protocol label is loaded, exactly five-tuple relationship and protocol type mapping relations are read into memory, in this way
Data acquisition module can quickly learn whether a certain message needs to capture, and by the message for needing to capture plus data protocol mark
Label.Five-tuple and the configuration of data protocol tag types mapping relations are as shown in table 1:
Protocol number | Filtration types | Protocol type | IP | Mask | Port |
1 | 0x01 | 0x11 | 0 | 0 | 80 |
1 | 0x01 | 0x11 | 0 | 0 | 8080 |
2 | 0x01 | 0x11 | 0 | 0 | 53 |
4 | 0x01 | 0x11 | 0 | 0 | 443 |
Table 1
Message analysis module is initialized, it is all first in dynamically load specified directoryMessage analysis module.Each report
Literary analysis module and base class unified interface, can load the message analysis module.Interface class is defined as follows:
class ParserBase
{
public :
ParserBase(){}
virtual ~ParserBase(){}
virtual int32_t Init(int32_t ithreadnum, QueueInfo *que){};
virtual void Run() = 0;
virtual int32_t GetTagId() = 0;
};
typedef ParserBase *create_t();
typedef void destory_t(ParserBase * base);
Initialization to data memory module, it will create one with database and connect, be saved in database for analyzing result
In, starting timing storage thread uses for Web query module.After initialization is fully completed, data acquisition module brings into operation,
Its workflow is for example as shown in Figure 4.One original message is obtained by Zero-copy mode first from network interface card, then analytic message
Header information, from data link layer, network layer to transport layer, layer-by-layer analytic message header information obtains five-tuple information and answers
With layer data initial position.It is matched with the five-tuple information being resolved to five-tuple protocol type mapping table, if
It fails to match, illustrates that the data message does not need to analyze, otherwise copies message, stamps data protocol label, data for the message
Protocol label can be multiple.
Finally by the data protocol label stamped, message is distributed in corresponding first data buffer area wait locate
Reason.
After the completion of data acquisition, message analysis module is identified and analyzed using message of the DPI technology to capture, DPI
Technology includes tagged word identification technology, application layer gateway identification technology and behavior pattern recognition technology, and the disclosure uses DPI technology
When selection any one of them or any one technology identified above combination.Different protocal analysis methods are slightly different, here
It is described in detail by taking http protocol analysis module as an example, as shown in Figure 5.
A message is taken out from corresponding first data buffer area of http protocol, obtains http protocol head content.Such as
Fruit is successfully searched http protocol head feature, then is carried out using high-performance canonical matching library to key content by characteristic character string
It extracts, characteristic character string includes GET, POST, PUT, Host and User-Agent etc., and the key content extracted includes Host, URL
With end message etc..After contents extraction success, content (i.e. useful information) will be extracted and be put into the second data buffer zone, number is waited
According to being saved in database.
Fig. 6 is data memory module work flow diagram, by taking http protocol as an example, process are as follows: from the second data buffer area
In take a data, Host is inserted into http_host table, if Host is repeated just to update access time or Host and not repeated,
Host is then corresponded to ID, URL and User-Agent to be inserted into http_parse table;When updating access if URL occurs and repeats
Between or URL do not repeat, then by former IP and Host deposit user_visit_host table in.
To sum up, first terminal described in the disclosure includes virtual and non-virtual terminal, can be realized disclosure the method
, belong to the first terminal of disclosure system.
The above are disclosure exemplary embodiment, the protection scope of the disclosure is limited by claims and its equivalent.
Claims (9)
1. a kind of method of user portrait and behavioural analysis based on DPI technology characterized by comprising
Setting data protocol label and network interface card title, the data protocol label are corresponding with the first data buffer area;
Message is intercepted from network interface card using zero duplication technology after initialization, by the message and data protocol label progress
Match;
If it fails to match, next message is intercepted again;
If successful match, the message content is copied, the data protocol label is set for the message, and the message is delayed
It deposits to first data buffer area;
The message is identified and analyzed using DPI technology, obtains the useful information of the message, by the useful information
It caches to the second data buffer area;
The useful information is taken out from second data buffer area, and the useful information is spliced into SQL statement and is submitted
Into database;
According to the database query user behavior and real-time flow table information.
2. the method for user portrait and behavioural analysis based on DPI technology as described in claim 1, which is characterized in that described
The process of initialization includes:
Load the data protocol label configuration;
According to the network interface card title Initial message capturing function and start message capturing thread;
Dynamically load message analysis dynamic base, the message analysis dynamic base include at least one message analysis module, are each
The message analysis module binds corresponding first data buffer area and the data protocol label, then starts the report
The processing thread of literary analysis module;
Create database connection pool, starting timing storage thread.
3. the method for user portrait and behavioural analysis based on DPI technology as claimed in claim 1 or 2, which is characterized in that obtain
The five-tuple information for taking the message matches the five-tuple information with the data protocol label.
4. the method for user portrait and behavioural analysis based on DPI technology as claimed in claim 3, which is characterized in that described
DPI technology includes tagged word identification technology, application layer gateway identification technology and behavior pattern recognition technology, uses the DPI skill
Any one of them or the combination of any one technology identified above are selected when art.
5. a kind of system of user portrait and behavioural analysis based on DPI technology characterized by comprising
First terminal, is arranged data protocol label and network interface card title, the data protocol label are corresponding with the first data buffer area;
Data acquisition module intercepts message using zero duplication technology after initialization from network interface card;
The data acquisition module includes matching unit, and the matching unit carries out the message and the data protocol label
Matching, if it fails to match, intercepts next message again;If successful match, the message content is copied, is set for the message
Set the data protocol label, and by the packet buffer to first data buffer area;
Message analysis dynamic base, including first data buffer area and at least one message analysis module, the message analysis
Module is identified and analyzed the message using DPI technology, obtains the useful information of the message, by the useful information
It caches to the second data buffer area;
Data memory module, including database and second data buffer area, take out institute from second data buffer area
Useful information is stated, and the useful information is spliced into SQL statement and is submitted in the database;
Web query module, user behavior and real-time flow table information according to first terminal described in the data base querying.
6. the system of user portrait and behavioural analysis based on DPI technology as claimed in claim 5, which is characterized in that described
Web query module includes:
Background query service unit provides interface for the data base querying;
The user behavior information of the first terminal is inquired and analyzed to foreground queries unit.
7. such as the system of the user portrait and behavioural analysis described in claim 5 or 6 based on DPI technology, which is characterized in that institute
It states data memory module and the useful information is written by the database using asynchronous system.
8. the system of user portrait and behavioural analysis based on DPI technology as claimed in claim 7, which is characterized in that described
DPI technology includes tagged word identification technology, application layer gateway identification technology and behavior pattern recognition technology, uses the DPI skill
Any one of them or the combination of any one technology identified above are selected when art.
9. the system of user portrait and behavioural analysis based on DPI technology as claimed in claim 8, which is characterized in that obtain
The five-tuple information of the message matches the five-tuple information with the data protocol label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910855391.8A CN110381094A (en) | 2019-09-11 | 2019-09-11 | A kind of method and system of user portrait and behavioural analysis based on DPI technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910855391.8A CN110381094A (en) | 2019-09-11 | 2019-09-11 | A kind of method and system of user portrait and behavioural analysis based on DPI technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110381094A true CN110381094A (en) | 2019-10-25 |
Family
ID=68261395
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910855391.8A Pending CN110381094A (en) | 2019-09-11 | 2019-09-11 | A kind of method and system of user portrait and behavioural analysis based on DPI technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110381094A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177595A (en) * | 2019-12-20 | 2020-05-19 | 杭州九略智能科技有限公司 | Method for extracting asset information in template mode aiming at HTTP (hyper text transport protocol) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105491018A (en) * | 2015-11-24 | 2016-04-13 | 北京中电普华信息技术有限公司 | System and method for network data security analysis based on DPI technology |
CN106982150A (en) * | 2017-03-27 | 2017-07-25 | 重庆邮电大学 | A kind of mobile Internet user behavior analysis method based on Hadoop |
CN107040405A (en) * | 2017-03-13 | 2017-08-11 | 中国人民解放军信息工程大学 | Passive type various dimensions main frame Fingerprint Model construction method and its device under network environment |
US20170289283A1 (en) * | 2016-04-01 | 2017-10-05 | App Annie Inc. | Automated dpi process |
-
2019
- 2019-09-11 CN CN201910855391.8A patent/CN110381094A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105491018A (en) * | 2015-11-24 | 2016-04-13 | 北京中电普华信息技术有限公司 | System and method for network data security analysis based on DPI technology |
US20170289283A1 (en) * | 2016-04-01 | 2017-10-05 | App Annie Inc. | Automated dpi process |
CN107040405A (en) * | 2017-03-13 | 2017-08-11 | 中国人民解放军信息工程大学 | Passive type various dimensions main frame Fingerprint Model construction method and its device under network environment |
CN106982150A (en) * | 2017-03-27 | 2017-07-25 | 重庆邮电大学 | A kind of mobile Internet user behavior analysis method based on Hadoop |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177595A (en) * | 2019-12-20 | 2020-05-19 | 杭州九略智能科技有限公司 | Method for extracting asset information in template mode aiming at HTTP (hyper text transport protocol) |
CN111177595B (en) * | 2019-12-20 | 2024-04-05 | 杭州九略智能科技有限公司 | Method for extracting asset information by templating HTTP protocol |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6694307B2 (en) | System for collecting specific information from several sources of unstructured digitized data | |
US6996798B2 (en) | Automatically deriving an application specification from a web-based application | |
CN102394885B (en) | Information classification protection automatic verification method based on data stream | |
US20090319515A1 (en) | System and method for managing entity knowledgebases | |
US20120215765A1 (en) | Systems and Methods for Generating Statistics from Search Engine Query Logs | |
CN103248677B (en) | The Internet behavioural analysis system and method for work thereof | |
EP1869583A1 (en) | Content adaptation | |
US20020143808A1 (en) | Intelligent document linking system | |
US7032017B2 (en) | Identifying unique web visitors behind proxy servers | |
CN101739453A (en) | Method and device for carrying out condition query on database table | |
CN102098331A (en) | Method and system for reducing WEB type application contents | |
CN109948343A (en) | Leak detection method, Hole Detection device and computer readable storage medium | |
CN105763543A (en) | Phishing site identification method and device | |
WO2017185912A1 (en) | Method and apparatus for collecting statistics about terminal device information based on hash node | |
CN103530429A (en) | Webpage content extracting method | |
CN103729479A (en) | Web page content statistical method and system based on distributed file storage | |
CN112434224A (en) | Tax preferential policy recommendation method and system based on knowledge graph | |
US20050055335A1 (en) | Search system and method | |
CN109857923A (en) | A kind of news intelligent recommendation method and system based on area media | |
CN110381094A (en) | A kind of method and system of user portrait and behavioural analysis based on DPI technology | |
JP2022105474A (en) | Method for verifying vulnerabilities of network devices using cve entries | |
CN111556039B (en) | Web data export method and device for general microservice | |
CN112003884B (en) | Method for collecting network assets and retrieving natural language | |
CN106982147B (en) | Communication monitoring method and device for Web communication application | |
CN111798351A (en) | Data processing method and device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191025 |