[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107798571B - Malice address/malice order identifying system, method and device - Google Patents

Malice address/malice order identifying system, method and device Download PDF

Info

Publication number
CN107798571B
CN107798571B CN201610797563.7A CN201610797563A CN107798571B CN 107798571 B CN107798571 B CN 107798571B CN 201610797563 A CN201610797563 A CN 201610797563A CN 107798571 B CN107798571 B CN 107798571B
Authority
CN
China
Prior art keywords
address
identified
probability
order
malicious
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610797563.7A
Other languages
Chinese (zh)
Other versions
CN107798571A (en
Inventor
肖谦
赵争超
林君
潘林林
张一昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610797563.7A priority Critical patent/CN107798571B/en
Priority to TW106119860A priority patent/TW201812689A/en
Priority to PCT/CN2017/097953 priority patent/WO2018040944A1/en
Publication of CN107798571A publication Critical patent/CN107798571A/en
Application granted granted Critical
Publication of CN107798571B publication Critical patent/CN107798571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • G06Q30/0637Approvals

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses identifying system, the method and device of a kind of malice address/malice order, is related to Internet technical field, is able to solve identification malice address/lower problem of malice order accuracy rate in the prior art.Method of the invention, which specifically includes that, receives the address to be identified that user client is sent;The hierarchical processing in address is carried out to the address to be identified, obtains each address level of the address to be identified;Probability distribution is jumped using the address level analyzed by history normal address, calculate the probability that jumps that each address level in the address to be identified jumps to adjacent next address level, the address level jump probability distribution include any one address level jump to another address level jump probability;Multiplication processing is carried out to each probability that jumps of acquisition, obtains the normal address probability of the address to be identified.

Description

System, method and device for identifying malicious address/malicious order
Technical Field
The invention relates to the technical field of internet, in particular to a system, a method and a device for identifying a malicious address/a malicious order.
Background
With the development of internet technology, people can not only realize operations such as watching videos, browsing webpages and chatting, but also can carry out shopping through the network, and the operation process of shopping is very convenient.
However, in practical applications, it often happens that some buyers purposely fill in incomplete delivery addresses, wrong delivery addresses, and other malicious behaviors to make the goods unreachable, thereby causing economic loss and credit loss to the merchants. The existing methods for identifying malicious addresses mainly include three types: (1) determining whether the address to be identified is a malicious address or not by matching the address to be identified with a preset malicious keyword; (2) respectively matching the address to be identified with the addresses in the black and white list to determine whether the address to be identified is a malicious address; (3) the method comprises the steps of carrying out hierarchical structural division on an address to be identified, and then matching with a preset address hierarchical structure to determine whether the address to be identified is a malicious address.
Although all of the three ways can identify some malicious addresses to some extent, some hidden malicious addresses cannot be identified, or normal addresses may be misjudged as malicious addresses. For example, for the same keyword, one address may be a malicious keyword, but another address may be a normal keyword, and therefore, if the keyword is identified as a preset malicious address, the normal address may be mistaken as the malicious address. For another example, since the black-and-white list is manually maintained according to actual feedback after the merchant delivers the goods, the method of identifying the goods by using the black-and-white list not only consumes manpower, but also cannot identify a new malicious address in time. For another example, for some addresses whose address hierarchy is complete but does not exist in real life, if the addresses are identified by using the predetermined address hierarchy, the addresses are mistaken as normal addresses. Therefore, the accuracy rate of identifying the malicious address in the prior art is low, so that the accuracy rate of identifying the malicious order is also low.
Disclosure of Invention
In view of this, the present invention provides a system, a method and a device for identifying a malicious address/a malicious order, which can solve the problem of low accuracy in identifying a malicious address/a malicious order in the prior art.
In a first aspect, the invention provides a system for identifying a malicious address, which comprises a user client, a server and a merchant client; wherein,
the user client is used for receiving an input address to be identified and sending the address to be identified to the server;
the server is used for receiving the address to be identified sent by the user client and carrying out address hierarchy processing on the address to be identified to obtain each address hierarchy of the address to be identified; calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the address to be identified by using address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any one address hierarchy jumping to another address hierarchy; multiplying the obtained jump probabilities to obtain the normal address probability of the address to be identified, and sending the identification result of malicious address identification based on the normal address probability to the merchant client;
and the merchant client is used for receiving and outputting the identification result sent by the server.
In a second aspect, the present invention provides a method for identifying a malicious address, where the method includes:
receiving an address to be identified sent by a user client;
carrying out address hierarchy processing on the address to be identified to obtain each address hierarchy of the address to be identified;
calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the address to be identified by using address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any one address hierarchy jumping to another address hierarchy;
and multiplying the obtained jump probabilities to obtain the normal address probability of the address to be identified.
In a third aspect, the present invention provides an apparatus for identifying a malicious address, including:
the receiving unit is used for receiving the address to be identified sent by the user client;
the first processing unit is used for carrying out address hierarchy processing on the address to be identified to obtain each address hierarchy of the address to be identified;
the calculation unit is used for calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the address to be identified by using address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any address hierarchy jumping to another address hierarchy;
and the second processing unit is used for multiplying the jump probabilities obtained by the calculating unit to obtain the normal address probability of the address to be identified.
In a fourth aspect, the present invention provides a system for identifying malicious orders, where the system includes a user client, a server, and a merchant client; wherein,
the user client is used for receiving an input order to be identified and sending the order to be identified to the server;
the server is used for receiving the order to be identified sent by the user client, and calculating the jump probability of each address level jumping to the next adjacent address level in the address of the order to be identified based on the address level jump probability distribution obtained by analyzing historical normal addresses, wherein the address level jump probability distribution comprises the jump probability of any address level jumping to another address level; multiplying the obtained jump probabilities to obtain the normal address probability of the address; judging whether the order to be identified is a malicious order or not according to the normal address probability, and sending a judgment result to the merchant client;
and the merchant client is used for receiving and displaying the judgment result sent by the server.
In a fifth aspect, the present invention provides a method for identifying a malicious order, the method including:
receiving an order to be identified sent by a user client;
calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the addresses of the order to be identified based on the address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any one address hierarchy jumping to another address hierarchy;
multiplying the obtained jump probabilities to obtain the normal address probability of the address;
and judging whether the order to be identified is a malicious order or not according to the normal address probability.
In a sixth aspect, the present invention provides an apparatus for identifying a malicious order, the apparatus comprising:
the receiving unit is used for receiving the order to be identified sent by the user client;
the calculating unit is used for calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the addresses of the order to be identified based on the address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any one address hierarchy jumping to another address hierarchy;
the processing unit is used for multiplying the obtained jump probabilities to obtain the normal address probability of the address;
and the judging unit is used for judging whether the order to be identified is a malicious order or not according to the normal address probability.
By means of the technical scheme, the system, the method and the device for identifying the malicious address/the malicious order can perform address hierarchy processing on the address to be identified to obtain each address hierarchy of the address to be identified after the server obtains the address to be identified and address hierarchy jump probability distribution obtained by analyzing historical normal addresses, then calculate the jump probability of each address hierarchy in the address to be identified jumping to the next adjacent address hierarchy by using the obtained address hierarchy jump probability distribution, and multiply the jump probabilities to obtain the probability that the address to be identified belongs to the normal address, so that whether the address to be identified is the malicious address or not is judged according to the probability, or whether the order including the address to be identified is the malicious order or not is judged according to the probability. Therefore, compared with the prior art that whether the address to be identified is a malicious address is judged by coarsely filtering through malicious keywords, a black and white list or an address hierarchy structure, the invention can not only obtain the normal address probability of the address containing the malicious keywords, the normal address probability of the address contained in the black and white list and the normal address probability of the address with complete address hierarchy structure, but also obtain the normal address probability of the address without the malicious keywords, the normal address probability of the address without the black and white list and the normal address probability of the address with incomplete address hierarchy structure by counting and analyzing the correlation among the address hierarchies in the historical normal address, judging the skip probability of each address hierarchy of the address to be identified by utilizing the analysis result, and then obtaining the probability that the whole address to be identified belongs to the normal address according to the skip probability, and whether the address to be identified is a malicious address or not can be determined according to the normal address probability, so that whether the order to be identified is a malicious order or not is determined according to whether the address to be identified is the malicious address or not, and the accuracy of malicious address/malicious order identification is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic diagram illustrating a system for identifying a malicious address according to an embodiment of the present invention;
FIG. 2 illustrates a merchant client-side selection interface diagram provided by an embodiment of the present invention;
fig. 3 is a flowchart illustrating a method for identifying a malicious address according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating another malicious address identification method according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating interaction between a server and a client in a malicious address identification process according to an embodiment of the present invention;
fig. 6 is a block diagram illustrating a malicious address identification apparatus according to an embodiment of the present invention;
fig. 7 is a block diagram illustrating another malicious address identification apparatus according to an embodiment of the present invention;
fig. 8 is a flowchart illustrating a method for identifying a malicious order according to an embodiment of the present invention;
fig. 9 is a block diagram illustrating a malicious order identification apparatus according to an embodiment of the present invention;
fig. 10 is a block diagram illustrating another malicious order identification apparatus according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In order to improve the accuracy of identifying a malicious address, an embodiment of the present invention provides a system for identifying a malicious address, as shown in fig. 1, the system includes a user client 11, a server 12, and a merchant client 13; wherein,
the user client 11 is configured to receive an input address to be identified, and send the address to be identified to the server 12;
the server 12 is configured to receive an address to be identified sent by the user client 11, and perform address hierarchy processing on the address to be identified to obtain each address hierarchy of the address to be identified; calculating the jump probability of each address level in the address to be identified jumping to the next adjacent address level by utilizing the address level jump probability distribution obtained by analyzing the historical normal address, wherein the address level jump probability distribution comprises the jump probability of any address level jumping to another address level; multiplying the obtained jump probabilities to obtain the normal address probability of the address to be identified, and sending the identification result of malicious address identification based on the normal address probability to the merchant client 13;
the merchant client 13 is used for receiving and outputting the identification result sent by the server 12.
The system for identifying the malicious address provided by the embodiment of the invention can perform address hierarchy processing on the address to be identified after the server receives the address to be identified sent by the user client, so as to obtain each address hierarchy of the address to be identified, then calculate the jump probability of each address hierarchy in the address to be identified jumping to the next adjacent address hierarchy by using address hierarchy jump probability distribution, and multiply each jump probability to obtain the probability that the address to be identified belongs to a normal address, so as to judge whether the address to be identified is the malicious address according to the probability. Therefore, compared with the prior art that whether the address to be identified is a malicious address is judged by coarsely filtering through malicious keywords, a black and white list or an address hierarchy structure, the invention can not only obtain the normal address probability of the address containing the malicious keywords, the normal address probability of the address contained in the black and white list and the normal address probability of the address with complete address hierarchy structure, but also obtain the normal address probability of the address without the malicious keywords, the normal address probability of the address without the black and white list and the normal address probability of the address with incomplete address hierarchy structure by counting and analyzing the correlation among the address hierarchies in the historical normal address, judging the skip probability of each address hierarchy of the address to be identified by utilizing the analysis result, and then obtaining the probability that the whole address to be identified belongs to the normal address according to the skip probability, and whether the address to be identified is a malicious address can be determined according to the normal address probability, so that the accuracy of malicious address identification is improved.
Further, the server 12 is configured to send an early warning prompt message to the merchant client 13 when the identification result is that the address to be identified is a malicious address;
the merchant client 13 is used for receiving and outputting the early warning prompt information sent by the server 12.
Further, the merchant client 13 is configured to output a selection interface for selecting an identification result of performing secondary identification on the address to be identified after receiving the warning prompt information, receive an identification result of the secondary identification input based on the selection interface, and return the identification result of the secondary identification to the server 12.
For example, as shown in fig. 2, after the merchant client receives the warning prompt information, the warning prompt information is displayed on the interface, and a selection interface for the merchant to select the secondary recognition result is also displayed, for example, there may be a text content "please contact the buyer to confirm again whether the address is a malicious address" on the selection interface, and two selection buttons "yes" and "no" are provided for the user to select.
It should be noted that the warning prompt message may be located on the selection interface or on another interface.
Further, the merchant client 13 is configured to output a selection interface for selecting an identification result of performing secondary identification on the address to be identified, receive an identification result, which is input based on the selection interface and is used for describing that the address to be identified is a malicious address, and return the address to be identified, which carries a malicious identifier, to the server 12, without receiving the warning prompt message.
Further, according to the above system embodiment, another embodiment of the present invention further provides a method for identifying a malicious address, as shown in fig. 3, the method mainly includes:
201. and receiving the address to be identified sent by the user client.
After the user places an order successfully, the user client (namely, the buyer client) can upload the order to the server, and after receiving the order, the server can perform malicious address identification operation on the order and send the order and the identification result of the order to the merchant client so that the merchant can perform corresponding processing on the order according to the identification result. Since some meaningless data often exist in the order to be identified received by the server, in order to prevent the data from interfering with the identification of the address to be identified, after the order to be identified is obtained, the server needs to pre-process the order to be identified, and then extract the address to be identified from the pre-processed order to be identified.
Therefore, the specific implementation process of acquiring the address to be identified may be: acquiring an order to be identified; carrying out redundancy processing and formatting processing on the order to be identified; and acquiring the address to be identified from the processed order to be identified.
The redundancy processing and formatting processing of the order to be identified specifically comprise:
(1) and filtering the characters which meet preset filtering conditions in the address to be identified of the order to be identified.
Since the user may fill some emoticons, meaningless english letters and other meaningless data in the address, it is possible to detect whether the address to be recognized contains such information, and if so, filter the information.
(2) And filtering dirty data in the order to be identified.
Since the server may store some dirty data including HTML (HyperText markup language) text, json (javascript Object notification) character strings and other abnormal information when storing the order to be recognized, the server may filter the dirty data.
(3) And formatting the filtered order to be identified according to a preset formatting rule.
Since the user may add space, use traditional Chinese characters, use pinyin and the like when filling in the address, the telephone and other information, in order to facilitate the subsequent accurate recognition of the address to be recognized, after filtering the order to be recognized, formatting operations such as space removal, full-angle and half-angle conversion, traditional Chinese character and simplified Chinese character conversion, pinyin conversion and the like are required, so that the obtained address has a uniform format.
It should be noted that, when analyzing the historical normal addresses and the historical malicious addresses, the preprocessing operation also needs to be performed.
202. And carrying out address hierarchy processing on the address to be recognized to obtain each address hierarchy of the address to be recognized.
Because each level of the address is only related to the next previous level, and not to other levels, the address hierarchy structure conforms to markov, so that the address hierarchy process can be performed using the conditional random field model. The specific implementation manner of performing address hierarchical processing on the address to be identified is as follows: after the address to be recognized is obtained, the server can perform word segmentation and address hierarchy labeling on the address to be recognized through the conditional random field model, and therefore each address hierarchy of the address to be recognized is obtained. For example, if the address to be identified is a unit of supo street in urban green sheep district, city, home, No. 5 building 1, the address hierarchy is: "province: sichuan province, city: metropolis, counties: blue and green sheep district, road: supo street, district: home, building number: floor No. 5, unit No.: 1 unit ".
203. And calculating the jump probability of each address level in the address to be identified jumping to the next adjacent address level by utilizing the address level jump probability distribution obtained by analyzing the historical normal address.
Wherein, the address level jump probability distribution comprises the jump probability of any address level jumping to another address level. Because the historical normal address is an address successfully delivered by a merchant, the server can count and analyze the address level jump conditions of the historical normal address after obtaining a large number of historical normal addresses, and obtain the address level jump probability distribution, so that the jump conditions among all address levels of the address to be identified can be determined through the address level jump probability distribution.
After each address hierarchy of the address to be identified is obtained, the server can calculate the jump probability of the adjacent address hierarchies in the address to be identified by using the address hierarchy jump probability distribution, namely the probability of the Nth hierarchy jumping to the (N + 1) th hierarchy. For example, at each address level "province" where the address to be identified is obtained: sichuan province, city: metropolis, counties: blue and green sheep district, road: supo street, district: home, building number: floor No. 5, unit No.: after 1 unit, the address hierarchy jump probability distribution can be utilized to obtain the probability of jumping from the Sichuan province to the metropolis, the probability of jumping from the metropolis to the ewe area, the probability of jumping from the ewe area to the Supo street, the probability of jumping from the Supo street to the home, the probability of jumping from the home to the building No. 5 and the probability of jumping from the building No. 5 to the 1 unit.
204. And multiplying the obtained jump probabilities to obtain the normal address probability of the address to be identified.
When the address level jump probability of the historical normal address is trained, a large number of addresses in the national range can be used for training, and after the jump probability of each address level of the address to be recognized is obtained, the jump probabilities can be multiplied to obtain the probability that the address to be recognized belongs to the normal address.
In practical applications, some malicious addresses may be pieced together sequentially from different places in a plurality of provinces, for example, Shenzhen Limited in the Higashi Industrial park of Tokko Longzhen Tongzhen Tokyo street, wherein the "North Shanghai Tokko district" belongs to the Shanghai Tokyo Address, and the "Shenzhen Limited in the Higashi Industrial park of Tokko Longzhen Tokyo street" belongs to the Guangdong province address. Therefore, when a large number of normal addresses across the country are used for training, only the jump between the address levels from the north gate area to the dragon sentry area is abnormal, and the other jumps are normal, so that the obtained whole address has high probability of belonging to the normal address, and is further misjudged as the normal address; if the historical normal address in Shanghai city is used for training alone, the whole address is normal only when jumping from Shanghai city to the North Gate area, and jumping among other address layers is abnormal, so that the probability that the whole address belongs to a malicious address is high, and the whole address is determined to be the malicious address. Therefore, after the variable of the province is added, the accuracy of malicious address identification is improved.
In practical application, after adding the variable of the province, the probability calculation formula for calculating the normal address of the address to be identified may be:
wherein S represents an address to be recognized, wiThe ith address level in the address to be identified is represented, and C represents the province to which the address to be identified belongs.
The method for identifying the malicious address provided by the embodiment of the invention can perform address hierarchy processing on the address to be identified after the address to be identified is obtained, obtain each address hierarchy of the address to be identified, calculate the jump probability of each address hierarchy in the address to be identified jumping to the next adjacent address hierarchy by using address hierarchy jump probability distribution, and multiply each jump probability to obtain the probability that the address to be identified belongs to a normal address, so as to judge whether the address to be identified is the malicious address according to the probability. Therefore, compared with the prior art that whether the address to be identified is a malicious address is judged by coarsely filtering through malicious keywords, a black and white list or an address hierarchy structure, the invention can not only obtain the normal address probability of the address containing the malicious keywords, the normal address probability of the address contained in the black and white list and the normal address probability of the address with complete address hierarchy structure, but also obtain the normal address probability of the address without the malicious keywords, the normal address probability of the address without the black and white list and the normal address probability of the address with incomplete address hierarchy structure by counting and analyzing the correlation among the address hierarchies in the historical normal address, judging the skip probability of each address hierarchy of the address to be identified by utilizing the analysis result, and then obtaining the probability that the whole address to be identified belongs to the normal address according to the skip probability, and whether the address to be identified is a malicious address can be determined according to the normal address probability, so that the accuracy of malicious address identification is improved.
Further, after the probability that the address to be identified belongs to the normal address is obtained, whether the address to be identified is a malicious address or not can be judged according to a preset identification rule and the normal address probability of the address to be identified.
Specifically, after obtaining the normal address probability of the address to be identified, the normal address probability may be directly used to determine whether the address to be identified is a malicious address, or other features corresponding to the address to be identified may be analyzed, and then whether the address to be identified is a malicious address is comprehensively determined according to the normal address probability and the other features (as described in steps 305 to 307 below). The specific implementation mode of directly judging whether the address to be identified is a malicious address by using the normal address probability is as follows: judging whether the normal address probability of the address to be identified is greater than a preset probability threshold value or not; if the probability of the normal address of the address to be identified is greater than a preset probability threshold, determining that the address to be identified is the normal address; and if the normal address probability of the address to be identified is less than or equal to the preset probability threshold, determining that the address to be identified is a malicious address.
In addition, after the server obtains the identification result, the identification result can be sent to the merchant client, so that the merchant client can receive and display the identification result, and the merchant can determine whether to ship according to the identification result.
Further, according to the above embodiment, another embodiment of the present invention further provides a method for identifying a malicious address, as shown in fig. 4, the method mainly includes:
301. and receiving the address to be identified sent by the user client. .
302. And carrying out address hierarchy processing on the address to be recognized to obtain each address hierarchy of the address to be recognized.
303. And calculating the jump probability of each address level in the address to be identified jumping to the next adjacent address level by utilizing the address level jump probability distribution obtained by analyzing the historical normal address.
304. And multiplying the obtained jump probabilities to obtain the normal address probability of the address to be identified.
305. And extracting preset identification characteristics for identifying whether the address to be identified is a malicious address from the order to be identified corresponding to the address to be identified and/or the historical order corresponding to the order to be identified.
Specifically, the preset identification features include any one or a combination of any several of the following items: an address text information feature, a historical shopping behavior feature, an order feature, and a cross feature.
Correspondingly, the step can be specifically detailed as the following steps a-d:
(a) and extracting corresponding address text information characteristics from the address to be recognized.
Wherein, the address text information characteristics include: whether to include a number of a preset length, whether to include a preset sensitive word, whether to include advertisement information, and the like. The preset length comprises the length of a mobile phone number, the length of a fixed telephone number, the length of a QQ number and the like.
Since users may fill in certain abusive information, cell phone numbers, advertising information, etc. in addresses for the purpose of abusive merchants or advertising themselves, their goods, and users who fill in such content may fill in a malicious address, the address text information features may be extracted from the address to be identified to analyze from that dimension whether the address to be identified is a malicious address.
(b) And extracting historical shopping behavior characteristics from historical orders corresponding to the orders to be identified.
Since the historical shopping behavior of the user can reflect whether the user is likely to fill in a malicious address, for example, the user who is frequently in dispute with a merchant, frequently refunds without reason and has a low transaction success rate is likely to fill in the malicious address, while the user who is never in dispute with the merchant, never refunds and has a high transaction success rate is less likely to fill in the malicious address, the historical shopping behavior feature can be extracted from the historical order corresponding to the order to be identified, and the feature is used as a dimension for judging whether the address to be identified is the malicious address.
In addition, in practical application, the historical shopping behavior characteristics mainly include: the payment method comprises the steps of paying the amount of orders within a preset time period, paying the total amount within the preset time period, refunding initiation total amount within the preset time period, transaction success rate within the preset time period, dispute merchant number within the preset time period, complaint initiation rate within the preset time period, refunding dispute occupation ratio within the preset time period and the like. The preset time periods of the historical shopping behavior characteristics can be the same or different.
(c) And extracting corresponding order characteristics from the order to be identified.
Specifically, the order features include: whether the telephone number in the order to be identified is normal or not, whether the use frequency of the address to be identified is greater than a preset use threshold value or not, and the relevant state of the shop corresponding to the order to be identified and the relevant state of the commodity corresponding to the order to be identified. Wherein the relevant status of the store includes: the time of opening the store, the fluctuation of the score of the store in the recent time period, the number of times the store is attacked maliciously, and the like; the relevant status of the goods includes: sales of the goods, price of the goods, whether the goods are hot, etc.
Since the user may intentionally fill in an incorrect telephone number when filling in an address, or fill in a new address that has not been used, and malicious activities tend to be concentrated on large merchants or hot goods, the server may extract these order features from the order to be identified and analyze whether the address to be identified is a malicious address by the dimension of the order features.
(d) And acquiring the cross characteristics corresponding to the address to be identified according to the combination of at least two items of the address text information characteristics, the historical shopping behavior characteristics, the order characteristics and the normal address probability of the address to be identified.
In practical application, basic characteristics of address text information characteristics, historical shopping behavior characteristics, order characteristics and normal address probability of an address to be recognized are combined in a cross mode to generate more abstract characteristic description, for example, the address text information characteristics and the order characteristics are combined in a cross mode, and the address to be recognized does not contain meaningless text description (namely the address does not contain information such as a telephone number, a QQ number, preset sensitive words and advertisement information), and is a common address of a user. Therefore, the cross-feature corresponding to the address to be identified can be used as another dimension for identifying the malicious address.
306. And acquiring a preset recognition model trained through a historical order.
Specifically, the implementation manner of the server training the preset recognition model may be: firstly, acquiring a historical order; then obtaining the normal address probability of the historical address carried in the historical order according to the address level jump probability distribution; extracting preset identification features from the historical orders; and finally, training a preset recognition model through the normal address probability of each historical address and the corresponding preset recognition characteristics.
The historical orders comprise historical normal orders and historical malicious orders in a preset proportion, and when the proportion of the historical normal orders to the historical malicious orders is about 4: 1, the accuracy rate of malicious address identification is relatively high.
It should be noted that, in practical applications, the preset recognition model to be trained in this step may be a GBDT (Gradient Boosting Decision Tree) model, or may also be other models, such as a SVM (Support Vector Machine) model, an LR (Logistic Regression) model, a neural network model, and the like.
307. And judging whether the address to be identified is a malicious address or not according to the normal address probability of the address to be identified, the preset identification characteristics and the preset identification model.
After obtaining the normal address probability and the preset identification features of the address to be identified, the server can input the features into a preset identification model for identification, so that the preset identification model can perform comprehensive analysis on the features, obtain the probability that the address to be identified finally belongs to the normal address or the probability of the malicious address, and determine whether the address to be identified is the malicious address according to a preset normal probability threshold or a preset malicious address probability threshold.
308. And if the address to be identified is judged to be a malicious address, sending early warning prompt information to the merchant client so that the merchant client can receive and output the early warning prompt information.
When the server judges that the address to be identified is a malicious address, in order to avoid economic and credit losses caused by the malicious address for a merchant, the server can send an early warning prompt message for indicating that the address is possibly the malicious address to the merchant client while sending the order to be identified to the merchant client, and after receiving the early warning prompt message, the merchant can contact with a buyer according to a telephone in the order so as to judge whether the address is really the malicious address; if the merchant determines that the address is a malicious address, the shipment can be refused, and if the merchant determines that the address is a normal address and not the malicious address, the shipment can be relieved.
In addition, if the server judges that the address to be identified is a normal address, the server can only send the order to be identified to the merchant client without sending early warning prompt information; when the merchant finds that the received order has no early warning prompt information, the order can be directly delivered according to the address in the order. However, the server may misjudge the malicious address as a normal address, and therefore, when the merchant finds that the address cannot be delivered in the actual shipping process, the merchant may select a button with the address as the malicious address in the merchant client, so that the merchant client sends the address to be identified with the malicious identifier to the server, and updates the historical normal address library and the historical malicious address library and retrains the preset identification model after the server receives the address to be identified with the malicious identifier.
309. And receiving an identification result which is sent by the merchant client and is used for carrying out secondary identification on the address to be identified based on the early warning prompt information.
In practical application, when a merchant determines that an address to be identified is a malicious address, a button for indicating that the address is determined to be the malicious address can be selected in a page (or a selection interface mentioned in the above system embodiment) of an early warning tool, so that a merchant client sends the address to be identified carrying a malicious identifier to a server; when the merchant determines that the address to be identified is a normal address and not a malicious address, a button for indicating the address to be identified is determined to be the normal address can be selected in a page of the early warning tool, so that the merchant client sends the address to be identified carrying the normal identifier to the server.
310. And if the identification result is that the address to be identified is a normal address, updating the historical normal address base, the historical malicious address base and the preset identification model.
And when the secondary recognition result sent by the merchant client is that the address to be recognized is a normal address, the server determines that the judgment is wrong, immediately updates a historical normal address library and a historical malicious address library, re-analyzes the address level jump probability distribution and re-trains the preset recognition model.
In addition, taking the GBDT model as an example, the interaction process between the server and the client in the embodiment of the present invention may be as shown in fig. 5, and it can be known from the above embodiment that the embodiment of the present invention not only can preliminarily obtain the probability that the address to be identified belongs to the normal address based on the address level jump probability distribution, but also can obtain other preset identification features such as address text information feature, historical shopping behavior feature, order feature, and cross feature from the historical order and the order to be identified, and input the normal address probability of the address to be identified and the preset identification features into the GBDT model (or other identification models) for comprehensive analysis, and determine whether the address to be identified is a malicious address, thereby further improving the accuracy of malicious address identification. In addition, when the server finally determines that the address to be identified is a malicious address, the server can also send early warning prompt information to the merchant client, so that the merchant can determine whether to ship or not by contacting with the buyer to verify whether the address is the malicious address, and further loss is avoided. Furthermore, after the merchant determines whether the address is a malicious address according to the actual situation, the merchant client can select a corresponding determination button to feed back the actual determination result to the server, so that the server can determine whether the error determination occurs according to the feedback of the merchant client, and if the error determination occurs, the GBDT model can be retrained in time, the GBDT model is more perfect, and the accuracy of subsequent malicious address identification is improved.
Further, according to the foregoing method embodiment, another embodiment of the present invention further provides an apparatus for identifying a malicious address, as shown in fig. 6, the apparatus mainly includes: a receiving unit 41, a first processing unit 42, a calculating unit 43, and a second processing unit 44. Wherein,
a receiving unit 41, configured to receive an address to be identified sent by a user client;
the first processing unit 42 is configured to perform address hierarchy processing on the address to be identified, so as to obtain each address hierarchy of the address to be identified;
a calculating unit 43, configured to calculate, by using address hierarchy jump probability distribution obtained through historical normal address analysis, a jump probability of each address hierarchy jumping to an adjacent next address hierarchy in an address to be identified, where the address hierarchy jump probability distribution includes a jump probability of any one address hierarchy jumping to another address hierarchy;
and the second processing unit 44 is configured to multiply each hop probability obtained by the calculating unit 43 to obtain a normal address probability of the address to be identified.
The malicious address recognition device provided by the embodiment of the invention can perform address hierarchy processing on an address to be recognized after the address to be recognized is obtained, obtain each address hierarchy of the address to be recognized, calculate the jump probability of each address hierarchy in the address to be recognized jumping to the next adjacent address hierarchy by using address hierarchy jump probability distribution, and multiply each jump probability to obtain the probability that the address to be recognized belongs to a normal address, so as to judge whether the address to be recognized is a malicious address according to the probability. Therefore, compared with the prior art that whether the address to be identified is a malicious address is judged by coarsely filtering through malicious keywords, a black and white list or an address hierarchy structure, the invention can not only obtain the normal address probability of the address containing the malicious keywords, the normal address probability of the address contained in the black and white list and the normal address probability of the address with complete address hierarchy structure, but also obtain the normal address probability of the address without the malicious keywords, the normal address probability of the address without the black and white list and the normal address probability of the address with incomplete address hierarchy structure by counting and analyzing the correlation among the address hierarchies in the historical normal address, judging the skip probability of each address hierarchy of the address to be identified by utilizing the analysis result, and then obtaining the probability that the whole address to be identified belongs to the normal address according to the skip probability, and whether the address to be identified is a malicious address can be determined according to the normal address probability, so that the accuracy of malicious address identification is improved.
Further, as shown in fig. 7, the apparatus further includes:
and the judging unit 45 is configured to, after obtaining the normal address probability of the address to be recognized, judge whether the address to be recognized is a malicious address according to a preset recognition rule and the normal address probability of the address to be recognized.
Further, as shown in fig. 7, the judgment unit 45 includes:
the extracting module 451 is used for extracting preset identification features for identifying whether the address to be identified is a malicious address from the order to be identified corresponding to the address to be identified and/or the historical order corresponding to the order to be identified;
an obtaining module 452 configured to obtain a preset recognition model trained by a historical order;
the first determining module 453 is configured to determine whether the address to be recognized is a malicious address according to the normal address probability of the address to be recognized, the preset recognition feature, and the preset recognition model.
Further, as shown in fig. 7, the extraction module 451 includes:
a first extraction submodule 4511, configured to extract corresponding address text information features from an address to be identified;
the second extraction submodule 4512 is configured to extract historical shopping behavior features from a historical order corresponding to the order to be identified;
and the third extraction submodule 4513 is configured to extract corresponding order features from the order to be identified.
Further, as shown in fig. 7, the extraction module 451 further includes:
the obtaining submodule 4514 is configured to obtain a cross feature corresponding to the address to be identified according to a combination of at least two of the address text information feature, the historical shopping behavior feature, the order feature, and the normal address probability of the address to be identified.
Further, the address text information features extracted by the first extraction submodule 4511 include: whether the number with the preset length is included, whether the preset sensitive word is included and whether the advertisement information is included;
the order features extracted by the third extraction sub-module 4513 include: whether the telephone number in the order to be identified is normal or not, whether the use frequency of the address to be identified is greater than a preset use threshold value or not, and the relevant state of the shop corresponding to the order to be identified and the relevant state of the commodity corresponding to the order to be identified.
Further, the obtaining module 452 is further configured to obtain historical orders, where the historical orders include historical normal orders and historical malicious orders in a preset ratio;
the obtaining module 452 is further configured to obtain a normal address probability of a historical address carried in the historical order according to the conditional random field model and the address level jump probability distribution;
the extraction module 451 is also used for extracting preset identification features from the historical orders;
as shown in fig. 7, the determination unit 45 further includes:
the training module 454 is configured to train a preset recognition model according to the normal address probability of each historical address and the corresponding preset recognition feature.
Further, as shown in fig. 7, the judgment unit 45 includes:
the second judging module 455 is configured to judge whether the normal address probability of the address to be identified is greater than a preset probability threshold;
a determining module 456, configured to determine that the address to be identified is a normal address when the determination result of the second determining module is that the probability of the normal address of the address to be identified is greater than the preset probability threshold, and determine that the address to be identified is a malicious address when the determination result of the second determining module is that the probability of the normal address of the address to be identified is less than or equal to the preset probability threshold.
Further, as shown in fig. 7, the apparatus further includes:
and the first sending unit 46 is configured to send the identification result of determining whether the address to be identified is the malicious address to the merchant client, so that the merchant client receives and outputs the identification result.
Further, as shown in fig. 7, the apparatus further includes:
a second sending unit 47, configured to send, when the determining unit 45 determines that the address to be identified is a malicious address, early warning prompt information to the merchant client, so that the merchant client receives and outputs the early warning prompt information;
the receiving unit 41 is configured to receive an identification result sent by the merchant client and used for performing secondary identification on the address to be identified based on the early warning prompt information;
and a first updating unit 48, configured to update the historical normal address library, the historical malicious address library, and the preset identification model when the identification result received by the first receiving unit 48 is that the address to be identified is a normal address.
Further, the receiving unit 41 is configured to receive an address to be identified, which is sent by the merchant client and carries the malicious identifier;
as shown in fig. 7, the apparatus further includes:
and a second updating unit 49, configured to update the historical normal address library, the historical malicious address library, and the preset identification model.
Further, the address to be identified is an address obtained after the first processing unit 42 performs redundancy processing and formatting processing on the order to be identified.
Further, as shown in fig. 7, the first processing unit 42 includes:
the filtering module 421 is configured to filter the characters that meet the preset filtering condition in the address to be identified of the order to be identified;
the filtering module 421 is further configured to filter dirty data in the order to be identified;
the processing module 422 is configured to format the order to be identified, which is filtered by the filtering module 421, according to a preset formatting rule.
The malicious address recognition device provided by the embodiment of the invention not only can preliminarily obtain the probability that the address to be recognized belongs to the normal address based on the address level jump probability distribution, but also can obtain other preset recognition characteristics such as address text information characteristics, historical shopping behavior characteristics, order characteristics, cross characteristics and the like from the historical order and the order to be recognized, and inputs the normal address probability of the address to be recognized and the preset recognition characteristics into the preset recognition model for comprehensive analysis to judge whether the address to be recognized is the malicious address, thereby further improving the accuracy of malicious address recognition. In addition, when the server finally determines that the address to be identified is a malicious address, the server can also send early warning prompt information to the merchant client, so that the merchant can determine whether to ship or not by contacting with the buyer to verify whether the address is the malicious address, and further loss is avoided. Furthermore, after the merchant determines whether the address is a malicious address according to the actual situation, the merchant client can select a corresponding determination button so as to feed the actual determination result back to the server, so that the server can determine whether the false determination occurs according to the feedback of the merchant client, and if the false determination occurs, the preset recognition model can be retrained in time, so that the preset recognition model is more perfect, and the accuracy of subsequent malicious address recognition is improved.
Further, in order to improve the accuracy of identifying malicious orders, another embodiment of the present invention provides a system for identifying malicious orders, which includes a user client, a server, and a merchant client; wherein,
the user client is used for receiving an input order to be identified and sending the order to be identified to the server;
the server is used for receiving an order to be identified sent by a user client, and calculating the jump probability of each address level jumping to the next adjacent address level in the address of the order to be identified based on the address level jump probability distribution obtained by analyzing historical normal addresses, wherein the address level jump probability distribution comprises the jump probability of any one address level jumping to another address level; multiplying the obtained jump probabilities to obtain the normal address probability of the address; judging whether the order to be identified is a malicious order or not according to the normal address probability, and sending a judgment result to a merchant client;
and the merchant client is used for receiving and displaying the judgment result sent by the server.
In the identification system for malicious orders provided by the embodiment of the invention, after a server receives an order to be identified sent by a user client, the probability distribution of address level jump is firstly utilized to calculate the probability that an address in the order to be identified belongs to a normal address, and then the probability is utilized to judge whether the order to be identified is a malicious order. Therefore, compared with the prior art that whether the address to be identified is a malicious address is judged by coarsely filtering through malicious keywords, a black and white list or an address hierarchy structure, the invention can not only obtain the normal address probability of the address containing the malicious keywords, the normal address probability of the address contained in the black and white list and the normal address probability of the address with complete address hierarchy structure, but also obtain the normal address probability of the address without the malicious keywords, the normal address probability of the address without the black and white list and the normal address probability of the address with incomplete address hierarchy structure by counting and analyzing the correlation among the address hierarchies in the historical normal address, judging the skip probability of each address hierarchy of the address to be identified by utilizing the analysis result, and then obtaining the probability that the whole address to be identified belongs to the normal address according to the skip probability, and whether the address is a malicious address is determined according to the normal address probability, so that the accuracy of identifying the malicious address is improved, and the accuracy of identifying a malicious order is improved.
Further, according to the system for identifying a malicious order mentioned in the foregoing embodiment, another embodiment of the present invention provides a method for identifying a malicious order, as shown in fig. 8, the method mainly includes:
501. and receiving the order to be identified sent by the user client.
After the user successfully places an order, the user client can upload the order to the server, and the server can perform malicious address identification operation on the order after receiving the order.
502. And calculating the jump probability of each address level jumping to the next adjacent address level in the addresses of the order to be identified based on the address level jump probability distribution obtained by analyzing the historical normal addresses.
Wherein, the address level jump probability distribution comprises the jump probability of any address level jumping to another address level.
Specifically, the server may perform address hierarchy processing on the address of the order to be identified to obtain each address hierarchy of the address (see step 202 above); then, based on the address level jump probability distribution, the jump probability of each address level jumping to the next adjacent address level is calculated (see step 203 above).
503. And multiplying the obtained jump probabilities to obtain the normal address probability of the address.
The specific implementation manner of this step is the same as that of step 204, and is not described herein again.
504. And judging whether the order to be identified is a malicious order or not according to the normal address probability.
Specifically, the server may first determine whether the address of the order to be identified is a malicious address according to the normal address probability; if the address of the order to be identified is a malicious address, determining that the order to be identified is a malicious order; and if the address of the order to be identified is a normal address, determining that the order to be identified is a normal order.
The specific implementation manner of determining whether the address of the order to be identified is a malicious address according to the normal address probability is the same as that in the embodiment of the "identification method of a malicious address", and details are not repeated here.
Further, in practical applications, a malicious user often disturbs the merchant by other methods besides the method of adding the malicious address, for example, filling in the telephone number, so that the merchant cannot contact with the malicious user, and therefore when the address of the order to be recognized is determined to be a normal address, it is necessary to determine whether the telephone number in the order to be recognized is normal. If the telephone number is abnormal, determining that the order to be identified is a malicious order; and if the telephone number is normal, determining that the order to be identified is a normal order.
The method for judging whether the telephone number is abnormal may be: and constructing a normal telephone number database, matching the telephone number to be identified with the normal telephone number database, if the matching fails, determining that the telephone number to be identified is abnormal, and if the matching succeeds, determining that the telephone number to be identified is normal.
According to the malicious order identification method provided by the embodiment of the invention, after a server receives an order to be identified sent by a user client, the probability distribution of address level jump is firstly utilized to calculate the probability that an address in the order to be identified belongs to a normal address, and then the probability is utilized to judge whether the order to be identified is a malicious order. Therefore, compared with the prior art that whether the address to be identified is a malicious address is judged by coarsely filtering through malicious keywords, a black and white list or an address hierarchy structure, the invention can not only obtain the normal address probability of the address containing the malicious keywords, the normal address probability of the address contained in the black and white list and the normal address probability of the address with complete address hierarchy structure, but also obtain the normal address probability of the address without the malicious keywords, the normal address probability of the address without the black and white list and the normal address probability of the address with incomplete address hierarchy structure by counting and analyzing the correlation among the address hierarchies in the historical normal address, judging the skip probability of each address hierarchy of the address to be identified by utilizing the analysis result, and then obtaining the probability that the whole address to be identified belongs to the normal address according to the skip probability, and whether the address is a malicious address is determined according to the normal address probability, so that the accuracy of identifying the malicious address is improved, and the accuracy of identifying a malicious order is improved.
Further, according to the method shown in fig. 8, another embodiment of the present invention provides a malicious order identification apparatus, as shown in fig. 9, the apparatus mainly includes:
the receiving unit 61 is used for receiving the order to be identified sent by the user client;
a calculating unit 62, configured to calculate, based on address hierarchy jump probability distribution obtained through historical normal address analysis, a jump probability that each address hierarchy jumps to an adjacent next address hierarchy in an address of an order to be identified, where the address hierarchy jump probability distribution includes a jump probability that an arbitrary address hierarchy jumps to another address hierarchy;
a processing unit 63, configured to multiply the obtained hop probabilities to obtain a normal address probability of the address;
and the judging unit 64 is used for judging whether the order to be identified is a malicious order or not according to the normal address probability.
Further, as shown in fig. 10, the judging unit 64 includes:
the determining module 641 is configured to determine whether the address of the order to be identified is a malicious address according to the normal address probability;
the determining module 642 is configured to determine that the order to be identified is a malicious order when the address of the order to be identified is a malicious address.
Further, the determining module 641 is further configured to determine whether the phone number in the order to be identified is normal when the address of the order to be identified is a normal address;
the determining module 642 is further configured to determine the order to be identified as a malicious order when the phone number is abnormal.
Further, as shown in fig. 10, the calculation unit 62 includes:
the processing module 621 is configured to perform address hierarchy processing on an address of the order to be identified, so as to obtain each address hierarchy of the address;
a calculating module 622, configured to calculate a jump probability for each address level jumping to an adjacent next address level based on the address level jump probability distribution.
According to the identification device for the malicious order, after the server receives the order to be identified sent by the user client, the probability distribution of address level jump is utilized to calculate the probability that the address in the order to be identified belongs to the normal address, and then the probability is utilized to judge whether the order to be identified is the malicious order. Therefore, compared with the prior art that whether the address to be identified is a malicious address is judged by coarsely filtering through malicious keywords, a black and white list or an address hierarchy structure, the invention can not only obtain the normal address probability of the address containing the malicious keywords, the normal address probability of the address contained in the black and white list and the normal address probability of the address with complete address hierarchy structure, but also obtain the normal address probability of the address without the malicious keywords, the normal address probability of the address without the black and white list and the normal address probability of the address with incomplete address hierarchy structure by counting and analyzing the correlation among the address hierarchies in the historical normal address, judging the skip probability of each address hierarchy of the address to be identified by utilizing the analysis result, and then obtaining the probability that the whole address to be identified belongs to the normal address according to the skip probability, and whether the address is a malicious address is determined according to the normal address probability, so that the accuracy of identifying the malicious address is improved, and the accuracy of identifying a malicious order is improved.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method, apparatus and system described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the malicious address/malicious order identification system, method and apparatus according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (24)

1. The system for identifying the malicious address is characterized by comprising a user client, a server and a merchant client; wherein,
the user client is used for receiving an input address to be identified and sending the address to be identified to the server;
the server is used for receiving the address to be identified sent by the user client and carrying out address hierarchy processing on the address to be identified to obtain each address hierarchy of the address to be identified; calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the address to be identified by using address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any one address hierarchy jumping to another address hierarchy; multiplying the obtained jump probabilities to obtain the normal address probability of the address to be identified, and sending the identification result of malicious address identification based on the normal address probability to the merchant client;
and the merchant client is used for receiving and outputting the identification result sent by the server.
2. The system according to claim 1, wherein the server is configured to send an early warning prompt message to the merchant client when the identification result is that the address to be identified is a malicious address;
and the merchant client is used for receiving and outputting the early warning prompt information sent by the server.
3. The system of claim 2, wherein the merchant client is configured to output a selection interface for selecting an identification result of secondary identification of the address to be identified after receiving the warning prompt message, receive an identification result of secondary identification based on the input of the selection interface, and return the identification result of secondary identification to the server.
4. The system according to claim 2 or 3, wherein the merchant client is configured to, in a case where the warning prompt message is not received, output a selection interface for selecting an identification result for performing secondary identification on the address to be identified, receive an identification result input based on the selection interface and used for describing that the address to be identified is a malicious address, and return the address to be identified, which carries a malicious identifier, to the server.
5. A method for identifying a malicious address, the method comprising:
receiving an address to be identified sent by a user client;
carrying out address hierarchy processing on the address to be identified to obtain each address hierarchy of the address to be identified;
calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the address to be identified by using address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any one address hierarchy jumping to another address hierarchy;
multiplying the obtained jump probabilities to obtain the normal address probability of the address to be identified;
after obtaining the normal address probability of the address to be identified, the method further includes:
and judging whether the address to be identified is a malicious address or not according to a preset identification rule and the normal address probability of the address to be identified.
6. The method of claim 5, wherein judging whether the address to be identified is a malicious address according to a preset identification rule and a normal address probability of the address to be identified comprises:
extracting preset identification characteristics for identifying whether the address to be identified is a malicious address or not from the order to be identified corresponding to the address to be identified and/or the historical order corresponding to the order to be identified;
acquiring a preset identification model trained through a historical order;
and judging whether the address to be identified is a malicious address or not according to the normal address probability of the address to be identified, the preset identification characteristics and the preset identification model.
7. The method according to claim 6, wherein extracting preset identification features for identifying whether the address to be identified is a malicious address from the order to be identified corresponding to the address to be identified and/or the historical order corresponding to the order to be identified comprises:
extracting corresponding address text information characteristics from the address to be identified;
and/or extracting historical shopping behavior characteristics from a historical order corresponding to the order to be identified;
and/or extracting corresponding order characteristics from the order to be identified.
8. The method according to claim 7, wherein extracting preset identification features for identifying whether the address to be identified is a malicious address from the order to be identified corresponding to the address to be identified and/or the historical order corresponding to the order to be identified further comprises:
and acquiring the cross feature corresponding to the address to be identified according to the combination of at least two items of the address text information feature, the historical shopping behavior feature, the order feature and the normal address probability of the address to be identified.
9. The method of claim 7, wherein the address text information features comprise: whether the number with the preset length is included, whether the preset sensitive word is included and whether the advertisement information is included;
the order features include: whether the telephone number in the order to be identified is normal, whether the using frequency of the address to be identified is greater than a preset using threshold value, the relevant state of the shop corresponding to the order to be identified and the relevant state from the order to be identified to the corresponding commodity.
10. The method of claim 6, wherein prior to obtaining the pre-set recognition model trained by the historical orders, the method further comprises:
acquiring historical orders, wherein the historical orders comprise historical normal orders and historical malicious orders in a preset proportion;
obtaining the normal address probability of the historical address carried in the historical order according to the address level jump probability distribution;
extracting preset identification features from the historical orders;
and training the preset recognition model according to the normal address probability of each historical address and the corresponding preset recognition characteristics.
11. The method of claim 5, wherein judging whether the address to be identified is a malicious address according to a preset identification rule and a normal address probability of the address to be identified comprises:
judging whether the normal address probability of the address to be identified is greater than a preset probability threshold value or not;
if the normal address probability of the address to be identified is greater than the preset probability threshold, determining that the address to be identified is a normal address;
and if the normal address probability of the address to be identified is less than or equal to the preset probability threshold, determining that the address to be identified is a malicious address.
12. The method of claim 5, further comprising:
and sending the identification result for judging whether the address to be identified is the malicious address to a merchant client so that the merchant client can receive and output the identification result.
13. The method of claim 6, further comprising:
if the address to be identified is judged to be a malicious address, early warning prompt information is sent to a merchant client, so that the merchant client can receive and output the early warning prompt information;
receiving an identification result which is sent by the merchant client and used for carrying out secondary identification on the address to be identified based on the early warning prompt information;
and if the identification result is that the address to be identified is a normal address, updating a historical normal address library, a historical malicious address library and the preset identification model.
14. The method of claim 6, further comprising:
receiving the address to be identified which is sent by a merchant client and carries a malicious identifier;
and updating a historical normal address library, a historical malicious address library and the preset identification model.
15. The method of claim 5, wherein the address to be identified is an address obtained after performing redundancy processing and formatting processing on the order to be identified.
16. The method of claim 15, wherein redundantly processing and formatting the order to be identified comprises:
filtering characters which meet preset filtering conditions in the address to be identified of the order to be identified;
filtering dirty data in the order to be identified;
and formatting the filtered order to be identified according to a preset formatting rule.
17. The method according to any one of claims 5 to 16, wherein address-layering the address to be identified comprises:
and carrying out address hierarchical processing on the address to be recognized based on the conditional random field model.
18. An apparatus for identifying a malicious address, the apparatus comprising:
the receiving unit is used for receiving the address to be identified sent by the user client;
the first processing unit is used for carrying out address hierarchy processing on the address to be identified to obtain each address hierarchy of the address to be identified;
the calculation unit is used for calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the address to be identified by using address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any address hierarchy jumping to another address hierarchy;
the second processing unit is used for multiplying the jump probabilities obtained by the calculating unit to obtain the normal address probability of the address to be identified;
the device further comprises:
and the judging unit is used for judging whether the address to be identified is a malicious address or not according to a preset identification rule and the normal address probability of the address to be identified after the normal address probability of the address to be identified is obtained.
19. The system for identifying the malicious orders is characterized by comprising a user client, a server and a merchant client; wherein,
the user client is used for receiving an input order to be identified and sending the order to be identified to the server;
the server is used for receiving the order to be identified sent by the user client, and calculating the jump probability of each address level jumping to the next adjacent address level in the address of the order to be identified based on the address level jump probability distribution obtained by analyzing historical normal addresses, wherein the address level jump probability distribution comprises the jump probability of any address level jumping to another address level; multiplying the obtained jump probabilities to obtain the normal address probability of the address; judging whether the order to be identified is a malicious order or not according to the normal address probability, and sending a judgment result to the merchant client;
and the merchant client is used for receiving and displaying the judgment result sent by the server.
20. A method of identifying a malicious order, the method comprising:
receiving an order to be identified sent by a user client;
calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the addresses of the order to be identified based on the address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any one address hierarchy jumping to another address hierarchy;
multiplying the obtained jump probabilities to obtain the normal address probability of the address;
and judging whether the order to be identified is a malicious order or not according to the normal address probability.
21. The method of claim 20, wherein determining whether the order to be identified is a malicious order based on the normal address probability comprises:
judging whether the address of the order to be identified is a malicious address or not according to the normal address probability;
and if the address of the order to be identified is a malicious address, determining that the order to be identified is a malicious order.
22. The method of claim 21, wherein if the address of the order to be identified is a normal address, the method further comprises:
judging whether the telephone number in the order to be identified is normal or not;
and if the telephone number is abnormal, determining that the order to be identified is a malicious order.
23. The method according to any one of claims 20 to 22, wherein calculating a jump probability for each address level in the addresses of the order to be identified to jump to an adjacent next address level based on an address level jump probability distribution derived from a historical normal address analysis comprises:
carrying out address hierarchy processing on the address of the order to be identified to obtain each address hierarchy of the address;
and calculating the jump probability of each address level jumping to the next adjacent address level based on the address level jump probability distribution.
24. An apparatus for identifying malicious orders, the apparatus comprising:
the receiving unit is used for receiving the order to be identified sent by the user client;
the calculating unit is used for calculating the jump probability of each address hierarchy jumping to the next adjacent address hierarchy in the addresses of the order to be identified based on the address hierarchy jump probability distribution obtained by analyzing historical normal addresses, wherein the address hierarchy jump probability distribution comprises the jump probability of any one address hierarchy jumping to another address hierarchy;
the processing unit is used for multiplying the obtained jump probabilities to obtain the normal address probability of the address;
and the judging unit is used for judging whether the order to be identified is a malicious order or not according to the normal address probability.
CN201610797563.7A 2016-08-31 2016-08-31 Malice address/malice order identifying system, method and device Active CN107798571B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201610797563.7A CN107798571B (en) 2016-08-31 2016-08-31 Malice address/malice order identifying system, method and device
TW106119860A TW201812689A (en) 2016-08-31 2017-06-14 System, method, and device for identifying malicious address/malicious purchase order
PCT/CN2017/097953 WO2018040944A1 (en) 2016-08-31 2017-08-18 System, method, and device for identifying malicious address/malicious purchase order

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610797563.7A CN107798571B (en) 2016-08-31 2016-08-31 Malice address/malice order identifying system, method and device

Publications (2)

Publication Number Publication Date
CN107798571A CN107798571A (en) 2018-03-13
CN107798571B true CN107798571B (en) 2019-08-30

Family

ID=61301279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610797563.7A Active CN107798571B (en) 2016-08-31 2016-08-31 Malice address/malice order identifying system, method and device

Country Status (3)

Country Link
CN (1) CN107798571B (en)
TW (1) TW201812689A (en)
WO (1) WO2018040944A1 (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108683749B (en) * 2018-05-18 2021-07-06 携程旅游信息技术(上海)有限公司 Method, device and medium for judging random mailbox address
CN108876545A (en) * 2018-06-22 2018-11-23 北京小米移动软件有限公司 Order recognition methods, device and readable storage medium storing program for executing
CN110852080B (en) * 2018-08-01 2024-06-21 北京京东尚科信息技术有限公司 Order address identification method, system, equipment and storage medium
CN109345332A (en) * 2018-08-27 2019-02-15 中国民航信息网络股份有限公司 A kind of intelligent detecting method of Airline reservation malicious act
CN110874778B (en) * 2018-08-31 2023-04-25 阿里巴巴集团控股有限公司 Abnormal order detection method and device
CN109407504B (en) * 2018-11-30 2021-05-14 华南理工大学 Personal safety detection system and method based on smart watch
CN109587248B (en) * 2018-12-06 2023-08-29 腾讯科技(深圳)有限公司 User identification method, device, server and storage medium
CN109947564B (en) * 2019-03-07 2023-04-11 蚂蚁金服(杭州)网络技术有限公司 Service processing method, device, equipment and storage medium
CN110335115A (en) * 2019-07-01 2019-10-15 阿里巴巴集团控股有限公司 A kind of service order processing method and processing device
CN110503517A (en) * 2019-08-13 2019-11-26 蚌埠聚本电子商务产业园有限公司 A kind of fallacious message detection and method of disposal for e-commerce
CN110807685B (en) * 2019-10-22 2021-09-07 上海钧正网络科技有限公司 Information processing method, device, terminal and readable storage medium
CN112950298A (en) * 2019-11-26 2021-06-11 北京沃东天骏信息技术有限公司 Malicious order identification method and device and storage medium
CN111132144B (en) * 2019-12-25 2022-09-13 中国联合网络通信集团有限公司 Abnormal number identification method and equipment
CN111461815B (en) * 2020-03-17 2023-04-28 上海携程国际旅行社有限公司 Order recognition model generation method, recognition method, system, equipment and medium
CN111859956B (en) * 2020-07-09 2021-08-27 睿智合创(北京)科技有限公司 Address word segmentation method for financial industry
CN111935646B (en) * 2020-07-22 2022-09-20 北京明略昭辉科技有限公司 Method and system for estimating common address of mobile equipment user
CN111915256B (en) * 2020-07-31 2023-09-26 上海寻梦信息技术有限公司 Method for constructing dispatch fence, off-site signing and identifying method and related equipment
CN112101993B (en) * 2020-09-11 2022-12-23 厦门美图之家科技有限公司 Offline anti-cheating method and device, electronic equipment and readable storage medium
CN112446425B (en) * 2020-11-20 2024-10-25 北京思特奇信息技术股份有限公司 Method and device for automatically acquiring suspected card-keeping channel
CN112491863B (en) * 2020-11-23 2022-07-29 中国联合网络通信集团有限公司 IP address black and gray list analysis method, server, terminal and storage medium
CN112686732B (en) * 2021-01-06 2023-07-11 中国联合网络通信集团有限公司 Abnormal address data identification method, device, equipment and medium
CN113240480A (en) * 2021-01-25 2021-08-10 天津五八到家货运服务有限公司 Order processing method and device, electronic terminal and storage medium
CN113076752A (en) * 2021-03-26 2021-07-06 中国联合网络通信集团有限公司 Method and device for identifying address
CN113449523B (en) * 2021-06-29 2024-05-24 京东科技控股股份有限公司 Method and device for determining abnormal address text, electronic equipment and storage medium
CN116934418B (en) * 2023-06-15 2024-03-19 广州淘通科技股份有限公司 Abnormal order detection and early warning method, system, equipment and storage medium
CN117371893A (en) * 2023-10-09 2024-01-09 杭州正马软件科技有限公司 System and method for automatically changing e-commerce order address

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008038017A1 (en) * 2006-09-29 2008-04-03 British Telecommunications Public Limited Company Information processing system and related method
CN103095711A (en) * 2013-01-18 2013-05-08 重庆邮电大学 Application layer distributed denial of service (DDoS) attack detection method and defensive system aimed at website
CN104462059A (en) * 2014-12-01 2015-03-25 银联智惠信息服务(上海)有限公司 Commercial tenant address information recognition method and device
CN105389722A (en) * 2015-11-20 2016-03-09 小米科技有限责任公司 Malicious order identification method and device
CN105468742A (en) * 2015-11-25 2016-04-06 小米科技有限责任公司 Malicious order recognition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008038017A1 (en) * 2006-09-29 2008-04-03 British Telecommunications Public Limited Company Information processing system and related method
CN103095711A (en) * 2013-01-18 2013-05-08 重庆邮电大学 Application layer distributed denial of service (DDoS) attack detection method and defensive system aimed at website
CN104462059A (en) * 2014-12-01 2015-03-25 银联智惠信息服务(上海)有限公司 Commercial tenant address information recognition method and device
CN105389722A (en) * 2015-11-20 2016-03-09 小米科技有限责任公司 Malicious order identification method and device
CN105468742A (en) * 2015-11-25 2016-04-06 小米科技有限责任公司 Malicious order recognition method and device
CN105468742B (en) * 2015-11-25 2018-11-20 小米科技有限责任公司 The recognition methods of malice order and device

Also Published As

Publication number Publication date
CN107798571A (en) 2018-03-13
WO2018040944A1 (en) 2018-03-08
TW201812689A (en) 2018-04-01

Similar Documents

Publication Publication Date Title
CN107798571B (en) Malice address/malice order identifying system, method and device
CN108205768B (en) Database establishing method, data recommending device, equipment and storage medium
CA2917256C (en) Screenshot-based e-commerce
WO2020001106A1 (en) Classification model training method and store classification method and device
CN109493101A (en) Target brand message determines method, apparatus, electronic equipment and storage medium
CN110827112A (en) Deep learning commodity recommendation method and device, computer equipment and storage medium
WO2014053453A1 (en) Remote system interaction
CN110362702B (en) Picture management method and equipment
CN115293332A (en) Method, device and equipment for training graph neural network and storage medium
CN112328802A (en) Data processing method and device and server
CN115147130A (en) Problem prediction method, apparatus, storage medium, and program product
CN110363206B (en) Clustering of data objects, data processing and data identification method
CN111091409B (en) Client tag determination method and device and server
CN113327132A (en) Multimedia recommendation method, device, equipment and storage medium
US20240134860A1 (en) Order searching method, apparatus, computer device, and storage medium
CN113495987A (en) Data searching method, device, equipment and storage medium
CN118193806A (en) Target retrieval method, target retrieval device, electronic equipment and storage medium
CN116010707A (en) Commodity price anomaly identification method, device, equipment and storage medium
CN112015970A (en) Product recommendation method, related equipment and computer storage medium
CN116703515A (en) Recommendation method and device based on artificial intelligence, computer equipment and storage medium
CN113077292B (en) User classification method and device, storage medium and electronic equipment
CN111753181A (en) Image-based search method, apparatus, server, client, and medium
CN114169006A (en) Training method of privacy compliance detection model, and privacy compliance detection method and device
KR20210111117A (en) Transaction system based on extracted image from uploaded media
CN112949752B (en) Training method and device of business prediction system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1252396

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant