CN117294873A - Abnormal media resource detection method and device, storage medium and electronic equipment - Google Patents
Abnormal media resource detection method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN117294873A CN117294873A CN202311205268.4A CN202311205268A CN117294873A CN 117294873 A CN117294873 A CN 117294873A CN 202311205268 A CN202311205268 A CN 202311205268A CN 117294873 A CN117294873 A CN 117294873A
- Authority
- CN
- China
- Prior art keywords
- resource
- sequence
- abnormal
- candidate
- media
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 360
- 238000001514 detection method Methods 0.000 title claims abstract description 34
- 238000003860 storage Methods 0.000 title claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 77
- 238000004590 computer program Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 5
- 230000006399 behavior Effects 0.000 description 71
- 230000000875 corresponding effect Effects 0.000 description 39
- 230000001680 brushing effect Effects 0.000 description 31
- 230000002776 aggregation Effects 0.000 description 19
- 238000004220 aggregation Methods 0.000 description 19
- 239000012634 fragment Substances 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 230000011218 segmentation Effects 0.000 description 12
- 230000009471 action Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 230000002547 anomalous effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/24—Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25866—Management of end-user data
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Graphics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The disclosure provides a method and a device for detecting abnormal media resources, a storage medium and electronic equipment. Wherein the method comprises the following steps: acquiring a plurality of resource operation sequences of the candidate object account, wherein the resource operation sequences comprise a plurality of continuous resource operations of the candidate object account on a resource platform on a media resource; respectively acquiring sequence similarity between every two resource operation sequences in the plurality of resource operation sequences, and performing clustering operation on the plurality of resource operation sequences according to the sequence similarity to obtain at least one operation sequence set; determining the candidate object account as an abnormal object account under the condition that at least one operation sequence set comprises an abnormal sequence set; and determining abnormal media resources from a media resource set associated with the abnormal object account, wherein the media resource set comprises media resources corresponding to the resource operation executed by the abnormal object account. The method and the device solve the technical problem that the detection efficiency of the related abnormal media resources is low.
Description
Technical Field
The present invention relates to the field of computers, and in particular, to a method and apparatus for detecting an abnormal media resource, a storage medium, and an electronic device.
Background
The pushing of media assets on a media asset platform typically uses the asset heat of the media asset as an important indicator, in other words, if the heat of a work is higher, the more forceful the platform pushes it, thereby causing more high heat works to be generated on the platform.
Based on the pushing rule, some accounts can improve the heat of autogenous works by purchasing the brush server, and a large amount of heat can be obtained in a short period after purchasing the brush service. The flow platform may then determine that the video is a high quality video and recommend it to more people. The video author's video then takes a large number of exposures, quickly drawing attention, and ultimately benefiting therefrom.
In the related art, heat monitoring is often performed on each media resource in the full media resources in the platform, and whether the work has a brushing behavior is determined according to the monitoring result. In such a way, only after each work is monitored in heat, cheating judgment can be performed on heat characteristics of a single work, so that the technical problem of low detection efficiency of abnormal media resources exists.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides a method and a device for detecting abnormal media resources, a storage medium and electronic equipment, which are used for at least solving the technical problem of low detection efficiency of the existing abnormal media resources.
According to an aspect of an embodiment of the present invention, there is provided a method for detecting an abnormal media resource, including: acquiring a plurality of resource operation sequences of candidate object accounts, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on a media resource by the candidate object accounts on a resource platform; respectively acquiring sequence similarity between every two resource operation sequences in a plurality of resource operation sequences, and executing clustering operation on the plurality of resource operation sequences according to the sequence similarity to obtain at least one operation sequence set; determining the candidate object account as an abnormal object account when at least one of the operation sequence sets includes an abnormal sequence set, wherein the sequence similarity between the resource operation sequence and the abnormal operation sequence included in the abnormal sequence set is greater than or equal to a target threshold; and determining an abnormal media resource from a media resource set associated with the abnormal object account, wherein the media resource set comprises the media resource corresponding to the resource operation executed by the abnormal object account.
According to another aspect of the embodiment of the present invention, there is also provided a device for detecting an abnormal media resource, including: the acquisition unit is used for acquiring a plurality of resource operation sequences of the candidate object account, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on the media resource by the candidate object account on the resource platform; a clustering unit, configured to perform clustering operation on a plurality of the resource operation sequences according to sequence similarity between the resource operation sequences, to obtain at least one operation sequence set; a determining unit, configured to determine, when at least one of the operation sequence sets includes an abnormal sequence set, the candidate object account as an abnormal object account, where the sequence similarity between the resource operation sequence and the abnormal operation sequence included in the abnormal sequence set is greater than or equal to a target threshold; the detecting unit is used for determining abnormal media resources from a media resource set associated with the abnormal object account, wherein the media resource set comprises media resources corresponding to the resource operation executed by the abnormal object account.
Optionally, the clustering unit includes: the clustering module is used for respectively determining a plurality of resource operation sequences as a candidate operation sequence set, and repeating the following steps until the set similarity between every two candidate operation sequence sets is smaller than a reference threshold value: respectively determining the set similarity between every two candidate operation sequence sets according to the sequence similarity, and clustering the two operation sequence sets with the set similarity meeting a clustering condition to obtain an updated candidate operation sequence set; and acquiring the set similarity between every two candidate operation sequence sets according to the updated candidate operation sequence sets and the rest candidate operation sequence sets.
Optionally, the clustering module is further configured to one of: under the condition that the set similarity between two reference operation sequence sets is higher than the set similarity between every two other candidate operation sequence sets, determining that the set similarity meets the clustering condition; and under the condition that the set similarity between the two reference operation sequence sets is higher than a similarity threshold value, determining that the set similarity meets the clustering condition.
Optionally, the clustering module is configured to: under the condition that each of the two obtained candidate operation sequence sets comprises one resource operation sequence, taking the sequence similarity between the two resource operation sequences as the set similarity; under the condition that each of the two obtained candidate operation sequence sets comprises a plurality of resource operation sequences, determining set features of the candidate operation sequence sets according to sequence features of each of the plurality of resource operation sequences in the candidate operation sequence sets, and determining feature distances between the set features of each of the two candidate operation sequence sets as the set similarity; when a first candidate operation sequence set of the two obtained candidate operation sequence sets comprises one resource operation sequence and a second candidate operation sequence of the two candidate operation sequence sets comprises a plurality of resource operation sequences, determining the set characteristics of the second candidate operation sequence according to the sequence characteristics of each of the plurality of resource operation sequences included in the second candidate operation sequence; and acquiring the sequence features of the resource operation sequences included in the first candidate operation sequence set, and determining a feature distance between the set features and the sequence features as the set similarity.
Optionally, the above clustering module is further configured to: respectively acquiring sequence characteristics of each of a plurality of resource operation sequences through a target characteristic extraction model; and determining the feature distance between every two sequence features in turn, and determining the feature distance as the sequence similarity between the two corresponding resource operation sequences.
Optionally, the clustering module is configured to: obtaining an operation log of the candidate object account in a target period, wherein the operation log comprises an operation record of the resource operation, which is executed by the candidate object account on the resource platform, of the media resource; generating a reference resource operation sequence of the candidate object account in the target period according to the operation records included in the operation log and the time sequence; and dividing the reference resource operation sequence according to a target length to obtain a plurality of resource operation sequences, wherein each resource operation sequence in the plurality of resource operation sequences comprises the same number of resource operations.
Optionally, the device for detecting abnormal media resources further includes one of the following: a first obtaining unit configured to obtain a set feature of a third operation sequence set from the at least one operation sequence set, and determine that the third operation sequence set is the abnormal sequence set when a feature similarity between the set feature and a sequence feature of the abnormal operation sequence is greater than or equal to a first threshold; a second obtaining unit, configured to obtain a plurality of third resource operation sequences included in a third operation sequence set in the at least one operation sequence set, and determine that the third operation sequence set is the abnormal sequence set when a proportion of the abnormal operation sequences included in the plurality of third resource operation sequences is greater than or equal to a second threshold.
Optionally, the detection unit includes at least one of: the first detection module is used for acquiring a media resource set associated with the abnormal object account, and respectively extracting the resource operation characteristics of each media resource in the media resource set, wherein the resource operation characteristics are used for indicating the respective corresponding change trend of a plurality of resource indexes of the media resource; determining the abnormal media resources from the media resource set according to the resource operation characteristics; and the second detection module is used for respectively acquiring the media resource sets respectively associated with the abnormal object account numbers and determining the media resources included in the intersection of the media resource sets as the abnormal media resources.
Optionally, the detecting device for abnormal media resources further includes at least one of the following: the pushing unit is used for reducing the pushing probability of pushing the abnormal media resources to the object account on the resource platform; the prompting unit is used for pushing prompting information to the target object account for issuing the abnormal media resources; and the setting unit is used for setting the abnormal media resource to be in an inaccessible state on the resource platform.
Optionally, the second detection module is further configured to: acquiring a first media resource set associated with a first abnormal object account; traversing each first media resource in the first media resource set, and acquiring an object account set corresponding to each first media resource, wherein the object account set comprises a plurality of object accounts for executing the resource operation on the first media resource; and determining a second abnormal object account from the object account set.
According to yet another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described method of detecting an abnormal media resource when run.
According to yet another aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the method of detecting an abnormal media resource as above.
According to still another aspect of the embodiments of the present invention, there is also provided an electronic device including a memory, in which a computer program is stored, and a processor configured to execute the above method for detecting an abnormal media resource by the above computer program.
In the embodiment of the invention, a plurality of resource operation sequences for acquiring the candidate object account are adopted, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on the media resource by the candidate object account on a resource platform; respectively acquiring sequence similarity between every two resource operation sequences in the plurality of resource operation sequences, and performing clustering operation on the plurality of resource operation sequences according to the sequence similarity to obtain at least one operation sequence set; determining the candidate object account as an abnormal object account under the condition that at least one operation sequence set comprises an abnormal sequence set; and determining the abnormal media resources from the media resource set associated with the abnormal object account, wherein the media resource set comprises the media resources corresponding to the resource operation executed by the abnormal object account, so that the abnormal media resources are efficiently searched from the media resource set associated with the abnormal object account.
In the method for detecting the abnormal media resources, firstly, resource operation sequences for representing various resource operations executed by the candidate object account on the resource platform are obtained, wherein each resource operation sequence can comprise a plurality of resource operations, then an operation sequence set determined according to clustering results of the plurality of resource operation sequences is carried out, under the condition that the operation sequence set comprises the abnormal operation sequence set, the candidate object account can be determined to be the abnormal object account, then the abnormal media resources are determined according to the media resource set associated with the abnormal object account, further, the purposes of starting from the abnormal object account and monitoring the abnormal media resources in batches from the media resources associated with the abnormal object account are achieved, and therefore the rapid detection of the abnormal media resources is achieved, and the technical problem that the detection efficiency of the existing detection method is low is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a schematic diagram of a hardware environment of an alternative method for detecting abnormal media assets in accordance with an embodiment of the invention;
FIG. 2 is a flow chart of an alternative method of detecting an abnormal media asset according to an embodiment of the invention;
FIG. 3 is a flow chart of a method of generating an abnormal media asset according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an alternative operational behavior-to-behavior encoding mapping relationship according to an embodiment of the present invention;
FIG. 5 is a flow chart of another alternative method of detecting an abnormal media asset according to an embodiment of the invention;
FIG. 6 is a schematic diagram of an alternative method of detecting abnormal media assets in accordance with an embodiment of the invention;
FIG. 7 is a schematic diagram of another alternative method of detecting abnormal media assets in accordance with an embodiment of the invention;
FIG. 8 is a schematic diagram of an alternative sequence feature extraction model according to an embodiment of the invention;
FIG. 9 is a schematic diagram of an alternative partitioning method for a sequence of resource operations according to an embodiment of the invention;
FIG. 10 is a schematic diagram of another alternative partitioning method for a sequence of resource operations according to an embodiment of the invention;
FIG. 11 is a schematic diagram of a partitioning method of yet another alternative sequence of resource operations according to an embodiment of the invention;
FIG. 12 is a flow chart of yet another alternative method of detecting an abnormal media asset according to an embodiment of the invention;
FIG. 13 is a clustering result illustration of an alternative set of operational sequences in accordance with an embodiment of the present invention;
FIG. 14 is a schematic diagram of an alternative abnormal media asset detection device according to an embodiment of the present invention;
fig. 15 is a schematic structural view of another alternative electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The terms referred to in this application are described below:
data distribution offset: meaning that the probability distribution of the data changes over time. In practice many data characteristics change over time, and thus such changes in distribution need to be taken into account and handled. For anomaly detection, the shift of the data distribution is a common situation, and thus a special method is required to deal with this situation.
APP (Application), application): mainly refers to software installed on terminal equipment (such as a smart phone and the like) for perfecting the deficiency and individualization of an original system.
Operation timing: a sequence of operations that account for a set of operations performed within an APP, general operations may include, but are not limited to: praise, comment, search, posting topics, and so forth.
Brush volume behavior (or referred to as abnormal behavior): the method mainly aims to artificially improve the browsing amount or interaction amount of specified media resources (such as video resources) on the Internet through abnormal operation means or technology, so as to achieve the aim of improving popularity or ranking.
According to an aspect of the embodiment of the present invention, there is provided a method for detecting an abnormal media resource, which may be applied, but not limited to, a system for detecting an abnormal media resource, which is shown in fig. 1 and is composed of a terminal device 102, a server 104 and a network 110. As shown in fig. 1, terminal device 102 communicates with a connection to server 104 via a network 110, which may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: local area networks, metropolitan area networks, and wide area networks, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communications. The terminal device may include, but is not limited to, at least one of: a mobile phone (e.g., an Android mobile phone, iOS mobile phone, etc.), a notebook computer, a tablet computer, a palm computer, a MID (Mobile Internet Devices, mobile internet device), a PAD, a desktop computer, a smart television, a vehicle-mounted device, etc. The terminal device 102 may be provided with a client for providing a detection function of an abnormal media resource.
The terminal device 102 is further provided with a display, a processor and a memory, wherein the display can be used for displaying a program interface of the resource platform, and various resource operations can be executed on a plurality of media resources in the program interface; the processor can be used for analyzing and playing the media resources; the memory is used for caching the acquired media resources.
The server 104 may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The server includes a database and a processing engine. The processing engine is used for clustering and analyzing the resource operation sequence; the database can be used for storing the resource operation sequences to be clustered.
According to an aspect of the embodiment of the present invention, the above-mentioned detection system for abnormal media resources may further perform the following steps: first, the terminal device 102 performs step S102 to perform a brushing operation on a plurality of media resources, where the brushing operation includes a plurality of continuous resource operations; next, S104 is executed, and the operation sequence is sent to the server 104 through the network 110;
then, in the server 104, each object account is first used as a candidate object account, and then step S106 to step S112 are executed to obtain a plurality of resource operation sequences of the candidate object account, where the resource operation sequences include a plurality of continuous resource operations of the candidate object account on the resource platform for executing the media resource; respectively acquiring sequence similarity between every two resource operation sequences in the plurality of resource operation sequences, and performing clustering operation on the plurality of resource operation sequences according to the sequence similarity to obtain at least one operation sequence set; under the condition that at least one operation sequence set comprises an abnormal sequence set, determining the candidate object account as an abnormal object account, wherein the sequence similarity between the resource operation sequence and the abnormal operation sequence which are included in the abnormal sequence set is larger than or equal to a target threshold; and determining the abnormal media resources from the media resource set associated with the abnormal object account, wherein the media resource set comprises the media resources corresponding to the resource operation executed by the abnormal object account.
In the embodiment of the invention, a plurality of resource operation sequences for acquiring the candidate object account are adopted, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on the media resource by the candidate object account on a resource platform; respectively acquiring sequence similarity between every two resource operation sequences in the plurality of resource operation sequences, and performing clustering operation on the plurality of resource operation sequences according to the sequence similarity to obtain at least one operation sequence set; determining the candidate object account as an abnormal object account under the condition that at least one operation sequence set comprises an abnormal sequence set; and determining the abnormal media resources from the media resource set associated with the abnormal object account, wherein the media resource set comprises the media resources corresponding to the resource operation executed by the abnormal object account, so that the abnormal media resources are efficiently searched from the media resource set associated with the abnormal object account.
In the method for detecting the abnormal media resources, firstly, resource operation sequences for representing various resource operations executed by the candidate object account on the resource platform are obtained, wherein each resource operation sequence can comprise a plurality of resource operations, then an operation sequence set determined according to clustering results of the plurality of resource operation sequences is carried out, under the condition that the operation sequence set comprises the abnormal operation sequence set, the candidate object account can be determined to be the abnormal object account, then the abnormal media resources are determined according to the media resource set associated with the abnormal object account, further, the purposes of starting from the abnormal object account and monitoring the abnormal media resources in batches from the media resources associated with the abnormal object account are achieved, and therefore the rapid detection of the abnormal media resources is achieved, and the technical problem that the detection efficiency of the existing detection method is low is solved.
The above is merely an example, and is not limited in any way in the present embodiment.
As an alternative embodiment, as shown in fig. 2, the method for detecting an abnormal media resource may include the following steps:
s202, acquiring a plurality of resource operation sequences of the candidate object account, wherein the resource operation sequences comprise a plurality of continuous resource operations of the candidate object account on a resource platform on a media resource;
s204, respectively acquiring sequence similarity between every two resource operation sequences in the plurality of resource operation sequences, and executing clustering operation on the plurality of resource operation sequences according to the sequence similarity to obtain at least one operation sequence set;
s206, under the condition that at least one operation sequence set comprises an abnormal sequence set, determining the candidate object account as an abnormal object account, wherein the sequence similarity between the resource operation sequence and the abnormal operation sequence which are included in the abnormal sequence set is greater than or equal to a target threshold;
s208, determining abnormal media resources from the media resource set associated with the abnormal object account, wherein the media resource set comprises media resources corresponding to the resource operation executed by the abnormal object account.
Firstly, it should be noted that, in the related embodiments of the present application, the obtaining manners of the resource operations of the object account all conform to the specifications of the related normative file, and the authorization permissions of the corresponding object account need to be obtained before the resource operations of the object account are obtained.
It should be noted that, the above embodiments of the present application may be applied to resource platforms including, but not limited to, a video resource platform, a picture resource platform, a music resource platform, etc. to implement detection of abnormal media resources. It can be understood that the abnormal media resource may be a media resource whose popularity is increased in a short time by purchasing a cheating service such as a swipe service in order to raise the popularity of a media resource (corresponding to different resource platforms, the media resource may be a video resource, a picture resource, an audio resource, etc.) displayed in an application by the object account. Correspondingly, in the above embodiment, the abnormal object account is an object account for providing the brushing cheating action.
The implementation of the above-described cheating behavior is described below with reference to fig. 3. Taking a video platform (including a streaming media platform, a short video platform and the like) as an example, a user (object account) firstly publishes a video work on the video platform, and then the requirement of brushing the video work is generated; then the user purchases the brushing service on the brushing platform and pays for submitting the work links of the video works; and then the brushing amount platform distributes brushing amount tasks for part of object account numbers in the video platform, and then a plurality of object account numbers execute brushing amount services (such as praise, forwarding, collection, comment and the like) for the video works in a short time on the video platform, and the video platform detects that the heat of the video works is increased greatly in a short time, and further distributes more platform flow for the video works, so that the heat of the works is further increased.
According to the embodiment of the application, the abnormal object account (the object account for executing the brushing task) can be detected from the platform, and then the abnormal media resources are searched in the media resource set associated with the abnormal object account, and the association relationship can include, but is not limited to, operation association relationships such as the media resources praised by the abnormal object account, the forwarded media resources, the collected media resources and the like. And further, the published abnormal media resources in the platform can be searched in batches through the abnormal object account.
In the step S202, the candidate account is the currently detected object account, and optionally, the candidate account may be obtained from a platform randomly; in another embodiment, the obtaining of the candidate object account may further be that after the abnormal object account is detected, the object account is found from the media resources associated with the abnormal object account, for example, after the abnormal object account is found, the abnormal media resources of the brushing operation performed by the abnormal object account are obtained, and then the candidate object account is determined from the object accounts that perform comment, forwarding and collection on the abnormal media resources. In the present embodiment, the method of acquiring the candidate account is not limited.
The resource operation sequence in the step S202 may include a plurality of continuous resource operations performed on the media resource by the candidate account on the resource platform, where the resource operations may include, but are not limited to, a search operation performed on the media resource, a praise operation performed on the media resource, a comment operation performed on the media resource, a browse operation performed on the media resource, and the like. In the present embodiment, the specific type of the above-described resource operation is not limited. In addition, it should be noted that the above-mentioned multiple continuous resource operations may be understood as operations that are continuous in time, for example, the candidate account performs a first operation at a first time, performs a second operation at a second time, performs a third operation at a third time, and determines that the sequence of resource operations is: a first operation, a second operation, and a third operation.
The above-described resource operation sequence is specifically described below with reference to fig. 4. The individual resource operations that may be included in the target media platform and their respective corresponding operation encodings are shown in FIG. 4. Before acquiring a plurality of resource operation sequences of candidate object accounts, firstly, recording the operation of the candidate object accounts in an APP in an operation log, and then ordering the operation of the candidate object accounts according to the occurrence time to generate an operation behavior sequence matched with the operation log; then, the operation behavior sequence of the candidate object account can be obtained, for example, the first candidate object account opens the video APP, and the following operations are performed: logging in, browsing, praise, comment, browsing, etc., then the generated operational behavior sequence corresponds to: 101. 104, 105, 106, 104 … …
Then, the operation behavior time sequence can be segmented into operation behavior fragments (namely the resource operation sequence); after the operation time sequence of the candidate object account is generated in the first step, a plurality of resource operation fragments need to be obtained further based on the operation time sequence of the candidate object account, wherein the operation fragments can be defined as being composed of a plurality of operation behaviors. It will be appreciated that the generation of the action fragments is also mainly used for more conveniently capturing the action of the brushing action, and because the goal of the brushing action is relatively strong, which is equivalent to the goal of the normal user, a certain number of continuous suspicious actions are performed at high frequency, for example, the brushing team performs the actions of searching-browsing-praying-commenting in batch for some works. Thus the action segment "search-browse-praise-comment" generally appears at high frequency in the action sequence of the swipe team, but appears less frequently in the action sequence of the normal user.
Next, in the step S204, the sequence similarity between each two resource operation sequences may be obtained, and then clustering operation may be performed according to the sequence similarity, so as to obtain at least one operation sequence set. The sequence similarity may be a similarity determined from a feature similarity between operation features of the respective resource operations; the clustering operations described above may include, but are not limited to, employing a partitioned clustering method, a hierarchical clustering method, a streaming clustering method, and the like. In the present embodiment, the specific mode of the clustering operation is not limited.
It should be noted that, under the condition that the candidate account has a plurality of resource operation sequences, a flow clustering mode may be adopted to realize efficient clustering of a large number of operation sequences. Under the condition that the resource operation sequence of the candidate object is limited, a hierarchical clustering mode can be adopted to improve the accuracy of a clustering result.
It may be understood that, by performing the clustering operation on the plurality of resource operation sequences of the candidate account, the associated high-frequency operation sequence of the candidate account may be determined according to the clustering result, and then step S206 is performed to detect the clustering result of the candidate account, so as to determine whether the high-frequency operation sequence of the candidate account includes an abnormal operation sequence, for example, an operation sequence "search-browse-praise-comment" that is highly matched with the operation behavior habit of the abnormal object account, and it may be understood that, in the case that the high-frequency operation sequence (i.e., the operation sequence set) of the candidate account includes the abnormal operation sequence "search-browse-praise-comment", the candidate account may be determined to be the abnormal object account.
After determining the abnormal object account, step S208 may be executed to determine abnormal media resources according to the media resource set associated with the abnormal object account. It may be appreciated that the media resource set associated with the abnormal object account may include, but is not limited to, a first resource subset of media resources that have been reviewed by the abnormal object account history, and a second resource subset of media resources that have been reviewed by the abnormal object account history; a third resource subset composed of media resources historically collected by the abnormal object account; and a fourth resource subset composed of the media resources praised by the abnormal object account history. And the media resource set associated with the abnormal object account is used as the media resource set to be screened, so that the screening of the total media resources in the platform is avoided, and the detection efficiency of the abnormal media resources is greatly improved.
Optionally, determining an abnormal media resource from the set of media resources associated with the abnormal object account includes at least one of:
the method comprises the steps of firstly, acquiring a media resource set associated with an abnormal object account, and respectively extracting resource operation characteristics of each media resource in the media resource set, wherein the resource operation characteristics are used for indicating respective corresponding change trends of a plurality of resource indexes of the media resource; determining abnormal media resources from the media resource set according to the resource operation characteristics;
and respectively acquiring media resource sets associated with the abnormal object accounts, and determining media resources included in the intersections of the media resource sets as abnormal media resources.
It will be appreciated that in this embodiment, the abnormal media resources may be further determined from the media resource set associated with the abnormal object account in the above two ways.
In the first aspect, it may be further determined whether the media resource set includes an abnormal media resource according to the heat time sequence data of the media resource. The following describes a way of further judging the abnormal media resource according to the above-mentioned passing way with reference to fig. 5.
S502, capturing the browsed, praise, comment and other behaviors by taking the media resources associated with the abnormal object account as targets;
s504, counting the heat degree in time intervals; specifically, the popularity of the media resource of the object account is counted in time periods, and the popularity statistics may be: browsing times, collection times, forwarding times, evaluation times, praise times and the like, and the divided time periods can be as follows: 5 minutes, 30 minutes, 1 hour, 3 hours, 12 hours, etc.;
s506, accumulating heat data for a period of time to form time sequence heat data;
it can be understood that after heat data for a certain time is accumulated, the heat data is ordered according to time to form time sequence data, and then the heat time sequence data is judged;
s508, identifying media resources with abnormal changes of heat through time sequence heat data;
s510, judging the works in the heat abnormality as media resources with the brushing behavior, and limiting the media resources.
It can be understood that in the above step, based on the determination result of the heat time sequence data, the video work with high suspicious heat data is determined as the media resource with the brushing behavior, so that the platform is prevented from misdetermining the video work as the high quality work and not pushing the video work.
In the second aspect, media resource sets associated with the plurality of abnormal object accounts may be acquired, respectively, and media resources included in an intersection of the plurality of media resource sets may be determined as abnormal media resources. It can be understood that in the above embodiments of the present application, the abnormal media resources may be determined rapidly by the intersection operation of the media resources associated with the abnormal object account.
As shown in fig. 6, the media resource set (for example, the media resource after commentary) associated with the first abnormal object account includes: resource a, resource b, resource c, resource d, resource e, resource f, resource g; the media resource set associated with the second abnormal object account comprises the following components: resource h, resource i, resource c, resource d, resource e, resource f, resource g. It can be appreciated that the brushing service purchased by the same object account generally requires a plurality of brushing accounts (abnormal object accounts) to perform brushing operation, so that a portion of the media resources associated with the brushing accounts often overlap, and the portion is the abnormal media resource for purchasing the brushing service. Thus, the media resources included in the intersection of the media resource set associated with the first abnormal object account and the media resource set associated with the second abnormal object account may be: resource c, resource d, resource e, resource f, and resource g are directly determined as anomalous media resources.
Optionally, in the second mode, before acquiring media resource sets associated with the plurality of abnormal object accounts respectively and determining media resources included in an intersection of the plurality of media resource sets as abnormal media resources, the method further includes:
s1, acquiring a first media resource set associated with a first abnormal object account;
s2, traversing each first media resource in the first media resource set, and acquiring an object account set corresponding to each first media resource, wherein the object account set comprises a plurality of object accounts for performing resource operation on the first media resources;
s3, determining a second abnormal object account from the object account set.
It may be appreciated that in this embodiment, when the first abnormal object account is determined, a first media resource set associated with the first abnormal object account may be acquired, an account set associated with the first media resource set is determined as a candidate object account, a second abnormal object account is determined from the candidate object account, and then an abnormal media resource is determined based on an intersection of the second abnormal object account and the media resource set associated with the first abnormal object account.
For example, under the condition that the first abnormal object account is determined, a resource set composed of media resources reviewed, praised, forwarded and collected by the first abnormal object account is obtained, then the object account for which operations of review, praised, forwarded and collected are performed on the media resources in the resource set is determined as a candidate object account, and whether the object account is an abnormal object account is determined by the method described in S202-206. And under the condition that the second abnormal object account is determined from the candidate object accounts, determining the abnormal media resources according to the intersection of the second abnormal object account and the media resource set associated with the first abnormal object account.
It can be understood that, through the above embodiment, the abnormal object account is determined based on the intersection of the media resources associated with the abnormal object account, so that the abnormal media resources are rapidly detected in batches, and the detection efficiency of the abnormal media resources is remarkably improved; in addition, other abnormal object accounts are further detected through the media resources associated with the abnormal object accounts, so that account screening and identification by taking the total object accounts as candidate object accounts are avoided, and the detection efficiency of the abnormal object accounts is obviously improved.
In the embodiment of the invention, a plurality of resource operation sequences for acquiring the candidate object account are adopted, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on the media resource by the candidate object account on a resource platform; respectively acquiring sequence similarity between every two resource operation sequences in the plurality of resource operation sequences, and performing clustering operation on the plurality of resource operation sequences according to the sequence similarity to obtain at least one operation sequence set; determining the candidate object account as an abnormal object account under the condition that at least one operation sequence set comprises an abnormal sequence set; and determining the abnormal media resources from the media resource set associated with the abnormal object account, wherein the media resource set comprises the media resources corresponding to the resource operation executed by the abnormal object account, so that the abnormal media resources are efficiently searched from the media resource set associated with the abnormal object account.
In the method for detecting the abnormal media resources, firstly, resource operation sequences for representing various resource operations executed by the candidate object account on the resource platform are obtained, wherein each resource operation sequence can comprise a plurality of resource operations, then an operation sequence set determined according to clustering results of the plurality of resource operation sequences is carried out, under the condition that the operation sequence set comprises the abnormal operation sequence set, the candidate object account can be determined to be the abnormal object account, then the abnormal media resources are determined according to the media resource set associated with the abnormal object account, further, the purposes of starting from the abnormal object account and monitoring the abnormal media resources in batches from the media resources associated with the abnormal object account are achieved, and therefore the rapid detection of the abnormal media resources is achieved, and the technical problem that the detection efficiency of the existing detection method is low is solved.
In an optional embodiment, performing a clustering operation on the plurality of resource operation sequences according to the sequence similarity between the resource operation sequences to obtain at least one operation sequence set includes:
s1, respectively determining a plurality of resource operation sequences as a candidate operation sequence set, and repeating the following steps until the set similarity between every two candidate operation sequence sets is smaller than a reference threshold value:
s1-1, respectively determining the set similarity between every two candidate operation sequence sets according to the sequence similarity, and clustering the two operation sequence sets with the set similarity meeting the clustering condition to obtain an updated candidate operation sequence set;
s1-2, acquiring the set similarity between every two candidate operation sequence sets according to the updated candidate operation sequence sets and the rest candidate operation sequence sets.
It can be understood that in this embodiment, since it is difficult to predict the number of class clusters in advance in the clustering scene of the media resource operation sequence, that is, how many class clusters will be divided cannot be known in advance, so that a hierarchical-based clustering algorithm is adopted, the algorithm is flexible, the number of class clusters and other too many parameters do not need to be set in advance, and a plurality of operation sequence sets are obtained through efficient clustering.
Optionally, before the clustering operation is performed on the two operation sequence sets with the set similarity meeting the clustering condition, one of the following steps is further included:
in the first mode, under the condition that the set similarity between two reference operation sequence sets is higher than the set similarity between every two other candidate operation sequence sets, the two reference operation sequence sets are combined;
and in the second mode, under the condition that the set similarity between the two reference operation sequence sets is higher than a similarity threshold value, merging the two reference operation sequence sets.
It can be appreciated that the present embodiment provides two alternative clustering methods, and in the first method, two candidate operation sequence sets with highest set similarity in the multiple candidate operation sequence sets may be combined; in the second manner described above, two sets of candidate operation sequences whose set similarity is higher than the similarity threshold may be combined. It should be noted that the first method may be a preferred clustering method.
The process of hierarchical clustering described above is described below with reference to fig. 7. As shown in fig. 7 (a), the plurality of resource operation sequences to be clustered include: sequence a, sequence B, sequence C, sequence D, sequence E, sequence F; in the first clustering process, firstly, each operation sequence is used as a candidate operation sequence set; further detecting the set similarity among the candidate operation sequence sets; under the condition that the similarity of the set comprising the sequence A and the set comprising the sequence B is the highest in the similarity of the sets of candidate operation sequences, combining the set comprising the sequence A and the set comprising the sequence B, namely as shown in a (B) diagram in fig. 7, then calculating the similarity of the sets of the (B) diagram in fig. 7, and under the condition that the similarity of the set comprising the sequence C and the set comprising the sequence D is the highest in the similarity of the sets of candidate operation sequences, combining the set comprising the sequence C and the set comprising the sequence D, namely as shown in a (C) diagram in fig. 7; finally, the set similarity between each set in the (c) graph in fig. 7 is calculated, and when it is determined that the set similarity between the set including the sequence E and the set including the sequence F is the highest among the set similarities between each candidate operation sequence set, the set including the sequence E and the set including the sequence F are combined, and the clustering result is shown in the (d) graph in fig. 7. In the case where it is determined that the set similarity between the respective sequence sets in the (d) graph in fig. 7 is smaller than the reference threshold value, the respective sequence sets in the (d) graph in fig. 7 are determined as the clustering result.
Through the above embodiment of the present application, to determine a plurality of resource operation sequences as one candidate operation sequence set respectively, repeating the following steps until the set similarity between every two candidate operation sequence sets is smaller than the reference threshold value: respectively determining the set similarity between every two candidate operation sequence sets according to the sequence similarity, and clustering the two operation sequence sets with the set similarity meeting the clustering condition to obtain an updated candidate operation sequence set; and acquiring the set similarity between every two candidate operation sequence sets according to the updated candidate operation sequence sets and the rest candidate operation sequence sets, so that efficient clustering operation between the sets comprising the resource operation sequences is realized, and the detection efficiency of the abnormal object account is further improved.
In an optional embodiment, the determining the set similarity between each two candidate operation sequence sets according to the sequence similarity includes:
s1, under the condition that two obtained candidate operation sequence sets respectively comprise one resource operation sequence, taking the sequence similarity between the two resource operation sequences as the set similarity;
S2, under the condition that each of the two obtained candidate operation sequence sets comprises a plurality of resource operation sequences, determining set features of the candidate operation sequence sets according to sequence features of each of the plurality of resource operation sequences included in the candidate operation sequence sets, and determining feature distances between the set features of each of the two candidate operation sequence sets as set similarity;
s3, under the condition that a first candidate operation sequence set in the two obtained candidate operation sequence sets comprises one resource operation sequence and a second candidate operation sequence in the two candidate operation sequence sets comprises a plurality of resource operation sequences, determining set characteristics of the second candidate operation sequence according to respective sequence characteristics of the plurality of resource operation sequences in the second candidate operation sequence; and acquiring sequence features of the resource operation sequences included in the first candidate operation sequence set, and determining feature distances between the set features and the sequence features as set similarity.
It will be appreciated that in the present embodiment, a method of determining the set similarity between each two candidate operation sequence sets based on the sequence similarity in each case is given.
In the first case, in the case where only one resource operation sequence is included in both sets, the sequence similarity between the resource operation sequences may be directly determined as the set similarity between the two candidate operation sequence sets;
in the second case, in the case where the two sets include a plurality of sequences similar to each other, the set features of the sequence set may be determined according to the sequence features of the operation sequences included in the sets, and then the set similarity may be determined based on feature distances between the set features. For example, a weighted average feature of sequence features of a plurality of operation sequences in a set in a feature space can be used as a set feature of the set, and then the set similarity is determined based on feature distances among the set features;
in another alternative manner, feature distances according to sequence features between each operation sequence in the two sets can be determined respectively, and a weighted average value between the sequence feature distances is determined as the feature distance of the set features; or, taking the highest characteristic distance in the sequence characteristic distances as the characteristic distance between the sets; or, the smallest feature distance among the sequence feature distances is used as the feature distance among the sets.
In the third case, when one set includes one resource operation sequence and the other set includes a plurality of resource operation sequences, the set feature of the sequence set may be determined according to the sequence feature of the operation sequence included in the set including the plurality of sequences, and the set feature distance between the two sets may be determined according to the set feature and the sequence feature of the resource operation sequence included in the other set, so as to determine the set similarity.
By the above embodiment of the present application, in the case that one resource operation sequence is included in each of the two obtained candidate operation sequence sets, the sequence similarity between the two resource operation sequences is taken as the set similarity; under the condition that a plurality of resource operation sequences are respectively included in the two obtained candidate operation sequence sets, determining set characteristics of the candidate operation sequence sets according to sequence characteristics of the resource operation sequences respectively included in the candidate operation sequence sets, and determining characteristic distances between the set characteristics of the two candidate operation sequence sets as set similarity; under the condition that a first candidate operation sequence set in the two obtained candidate operation sequence sets comprises one resource operation sequence and a second candidate operation sequence in the two candidate operation sequence sets comprises a plurality of resource operation sequences, determining the set characteristics of the second candidate operation sequence according to the sequence characteristics of each of the plurality of resource operation sequences in the second candidate operation sequence; the sequence features of the resource operation sequences included in the first candidate operation sequence set are obtained, feature distances between the set features and the sequence features are determined to be set similarity, and then under various different conditions, the set distances between the operation sequence sets are accurately determined, so that the accuracy of the clustering result is improved.
In an optional embodiment, before determining the plurality of resource operation sequences as one candidate operation sequence set, the method further includes:
s1, respectively acquiring sequence characteristics of each resource operation sequence in a plurality of resource operation sequences through a target characteristic extraction model;
s2, sequentially determining feature distances between every two sequence features, and determining the feature distances as sequence similarity between two corresponding resource operation sequences.
It can be understood that in this embodiment, feature extraction may be performed on the sequence features of each operation sequence through the target feature extraction model, so that the features of similar sequence features in the feature space are closer to each other, and further, the clustering result is improved.
In an alternative way, when the data size is too large, a CBOW (continuous bag-of-wprds) algorithm of word2vec can be adopted to learn the vector representation of the operation behavior segment, so that the learning speed is improved;
in an alternative embodiment, the Skip-gram model pairs in the word2vec algorithm are used for feature extraction of sequences for more thorough mining. As shown in fig. 8, when the input sequence is a segment t, firstly, mapping is performed by w (t) to obtain a vector representation corresponding to the segment t, then, inverse mapping is performed on the vector representation corresponding to the segment t by w (t-2), so as to obtain a fitted segment t-2, loss is performed on the fitted segment t-2 and a real segment t-2, and further, model parameters of mapping methods such as w (t), w (t-2) and the like are adjusted and trained; similarly, the vector representation corresponding to the segment t can be inversely mapped through w (t-1), so that a fitted segment t-1 can be obtained, and the loss is carried out through the fitted segment t-1 and the real segment t-1.
In this embodiment, there may be a sequence relationship over time between the plurality of resource operation sequences of the candidate account. For example, according to the operation log of the candidate object account, the operation time sequence can be determined as follows: operation 1, operation 2, operation 3, operation 4, operation 5, operation 6; in the case where the two operations are determined to be one resource operation sequence, the first operation sequence may be determined to be: operation 1-operation 2; the second sequence of operations is: operation 3-operation 4; the third sequence of operations is: operation 5-operation 6. When the second operation sequence is determined to be the segment t, the first operation sequence is the segment t-1, and the third operation sequence is the segment t+1.
After the sequence characteristics corresponding to the resource operation sequences are determined in the above manner, the sequence characteristics among the resource operation sequences are determined. In an alternative manner, in the case where the above-mentioned sequence feature is a feature vector, the feature distance between the respective sequence features may be, but is not limited to, a euclidean distance, a manhattan distance, a chebyshev distance, a mahalanobis distance, or the like. In the present embodiment, the manner of determining the characteristic distance is not limited.
In the above embodiment of the present application, the sequence feature of each of the plurality of resource operation sequences is obtained by using the target feature extraction model; and sequentially determining the feature distance between every two sequence features, and determining the feature distance as the sequence similarity between the two corresponding resource operation sequences, so as to accurately determine the sequence similarity between each resource operation sequence.
In an optional embodiment, the acquiring the plurality of resource operation sequences of the candidate object account includes:
s1, acquiring an operation log of a candidate object account in a target period, wherein the operation log comprises an operation record of resource operation of the candidate object account on a resource platform on a media resource;
s2, generating a reference resource operation sequence of the candidate object account in a target period according to the operation records included in the operation log and the time sequence;
s3, dividing the reference resource operation sequence according to the target length to obtain a plurality of resource operation sequences, wherein each resource operation sequence in the plurality of resource operation sequences comprises the same number of resource operations.
Two specific ways of acquiring the resource operation sequence are described below with reference to fig. 9, 10 and 11.
Firstly, the operation of the candidate object account in the APP is required to be stored in an operation log, and then the behavior operation of the user is ordered according to the occurrence time, so that a reference resource operation sequence is generated. The correspondence between various resource operations and operation codes is shown in fig. 4. Then, a reference resource operation sequence of the candidate object account can be obtained, for example, the first candidate object account clicks on the video APP at the mobile phone, and the following operations are performed: logging in, browsing, praise, comment, browsing and the like, then the generated reference resource operation sequence is: 101. 104, 105, 106, 104 … …
After the reference resource operation sequence of the candidate account is generated, the candidate account needs to be further divided based on the reference resource operation sequence of the candidate account to obtain a plurality of operation behavior segments (i.e. the plurality of resource operation sequences), where the operation segments may be defined as being composed of a plurality of operation behaviors.
The operation behavior fragments are mainly generated to be more convenient for capturing the brushing behavior operation, and because the brushing behavior operation has stronger purpose than the diffuse and non-purpose operation of the normal object account, a certain number of continuous suspicious operations can be executed at high frequency, for example, the brushing team can repeatedly execute the actions of searching, browsing, praying and commenting for certain works in batches. Thus, the action operation fragment of "search-browse-praise-comment" generally appears at high frequency in the operation sequence of the swipe team, but appears less frequently in the operation sequence of the normal object account.
The operation behavior sequence may be segmented by using an n-gram. There are two ways of dividing, namely overlapping and non-overlapping. The overlap type of splitting is shown in fig. 9, the non-overlap type of splitting is shown in fig. 10, and the difference between them is apparent from fig. 9 and 10.
Here, the window is first set to 4, for the reference resource operation sequence: "101-104-105-106-104-105-104-106-107-104", when an overlap cut is used, as shown in FIG. 9, the sequence of resource operations will be divided into: "101-104-105-106", "104-105-106-104", "105-106-104-105", "106-104-105-104", "104-105-104-106", etc. When non-overlapping slicing is used, as shown in fig. 10, the sequence of resource operations is sliced into: "101-104-105-106", "104-105-104-106", "104-106-107-104".
Optionally, when the length of the operation behavior sequence is not long, overlapping segmentation is preferably adopted, and the overlapping segmentation can be more sufficient for the segmentation of the operation behavior sequence.
Optionally, when the length of the operation behavior sequence is longer, non-overlapping segmentation is preferentially adopted, and the non-overlapping segmentation encounters a situation that the last operation behavior segment is not long enough, and then the previous operation behavior needs to be supplemented into the last operation behavior segment.
Further, as shown in fig. 11, for the overlapping segmentation method in fig. 9, the candidate account reference resource operation sequence is segmented into 7 operation behavior segments, namely, operation behavior segment 1 to operation behavior segment 7.
Then, 7 operation behavior fragments are connected according to the appearance sequence of each operation behavior fragment, and then an operation behavior fragment sequence can be generated. The manner of generating the sequence of operational behavior fragments is shown in fig. 11. The whole behavior operation sequence of the candidate object account can be regarded as an article, each operation behavior is regarded as a character, and the meaning interpretation effect of the character per se on the article is not as good as that of the word, so that the article can be subjected to word segmentation processing and segmented into words; correspondingly, the reference resource operation sequence can be segmented into resource operation sequences, namely operation behavior fragments, so as to further represent a continuous operation characteristic. Sequences are then formed from the operational behaviour fragments, the vector representation of each operational behaviour fragment being learned by the method as shown in figure 8.
According to the method and the device for obtaining the operation log of the candidate object account in the target period, the operation log comprises an operation record of resource operation, executed by the candidate object account on a resource platform, on a media resource; generating a reference resource operation sequence of the candidate object account in a target period according to the operation records included in the operation log and the time sequence; the method comprises the steps of dividing a reference resource operation sequence according to a target length to obtain a plurality of resource operation sequences, wherein each resource operation sequence in the plurality of resource operation sequences comprises the same number of resource operations, and further, the reference resource operation sequence of an object account is divided to obtain a plurality of resource operation sequences with the same length, so that the highly aggregated behavior of the object account can be accurately identified.
In an optional embodiment, in the case that the at least one operation sequence set includes an abnormal sequence set, before determining the candidate object account as the abnormal object account, one of the following is further included:
the first mode is that collection characteristics of a third operation sequence collection in at least one operation sequence collection are obtained, and the third operation sequence collection is determined to be an abnormal sequence collection under the condition that the characteristic similarity between the collection characteristics and the sequence characteristics of the abnormal operation sequence is larger than or equal to a first threshold value;
and in a second mode, acquiring a plurality of third resource operation sequences included in a third operation sequence set in at least one operation sequence set, and determining the third operation sequence set as an abnormal sequence set when the proportion of the abnormal operation sequences included in the plurality of third resource operation sequences is greater than or equal to a second threshold value.
It will be appreciated that in this embodiment, at least three ways are provided to determine whether the above set of operation sequences is an abnormal set of sequences.
In the first manner, feature similarity between the set features of the operation sequence set and the sequence features of the abnormal operation sequence may be obtained, and whether the operation sequence set is the abnormal sequence set (i.e., the abnormal operation behavior sequence indicating that the candidate account has aggregation) may be determined according to the feature similarity between the set features of the operation sequence set and the sequence features of the abnormal operation sequence. Alternatively, the set feature of the operation sequence set may be determined according to the sequence feature of the resource operation sequence included in the operation sequence set, for example, the set feature of the operation sequence set may be an average value of the sequence feature of all the resource operation sequences included in the operation sequence set in a feature space;
In the second aspect, it may be determined whether the operation sequence set is an abnormal sequence set according to a duty ratio of an abnormal operation sequence in the resource operation sequences included in the operation sequence set. For example, it may be determined that the operation sequence set is an abnormal sequence set according to the operation combination included in the abnormal operation sequence, such as "search-browse-praise-comment", "search-browse-praise-comment-collection", "search-browse-praise-collection", and the like, and then the operation sequence is compared with the resource operation sequence included in the operation sequence set according to the operation sequence, so as to determine the ratio of the abnormal operation sequence in the operation sequence, and if the ratio is greater than a certain value, the operation sequence set may be determined to be the abnormal sequence set.
In a third mode, the processing may be performed by combining the first mode and the second mode, that is, on one hand, a first comparison result may be obtained by comparing the feature similarity between the set feature and the sequence feature of the abnormal operation sequence, and on the other hand, a comparison result between the duty ratio of the abnormal operation sequence in the resource operation sequence and the second threshold may be obtained, so as to determine whether the operation sequence set is the abnormal sequence set according to the two comparison results.
Alternatively, in addition to checking whether the operation sequence set is an abnormal sequence set in the above manner, the distribution characteristics of each operation sequence in the operation sequence set in time may be further examined. It can be understood that, since the brushing account number has not only a spatial aggregation feature in the brushing cheating operation (i.e. the scale of the operation sequence corresponding to the brushing operation is larger), a certain aggregation feature also exists in time, i.e. the aggregation feature appears in the resource operation sequence corresponding to the cheating behavior in a short time.
For example, if the account a performs a suspected operation sequence of "search-browse-praise-comment" once a day, if the account a uses a year as a period, the operation sequence clustering result of the account a includes at least 365 operation sequences in a sequence set corresponding to "search-browse-praise-comment"; the account B is a newly created account, 100 times of search-browse-praise-comment operations are executed in one day, and a sequence set corresponding to the search-browse-praise-comment in an operation sequence clustering result of the account B comprises 100 times of operation sequences; if the operation sequence clustering result of the account A is seen from the aggregation scale of the space only, the set corresponding to the search-browse-praise-comment is judged to be an abnormal operation sequence set with a larger probability, and then the account A is judged to be an abnormal object account; however, it is empirically determined that account B is actually a greater probability of being an anomalous object account.
Therefore, in order to be suitable for the aggregation characteristics of the abnormal operation sequences in time, time aggregation parameters can be further configured for each operation sequence set to indicate the aggregation characteristics of the resource operation sequences included in the set in the time dimension, and a higher confidence is configured for the set with higher aggregation degree in the time dimension; for example, the time stamp corresponding to each resource operation sequence may be obtained, and then the mean square error of the time stamp corresponding to each resource operation sequence in the set is used as an index for evaluating the time aggregation feature in the set.
For example, for the operation sequence set a, the probability that the set is an abnormal sequence set is determined according to the above-mentioned mode three, and then the aggregation confidence of the set is determined to be 60% (i.e. although the aggregation degree is high in space, the aggregation degree is low in time) according to the above-mentioned sequence timestamp mean square error index, so that the probability that the set a is finally an abnormal sequence set can be determined to be 90% ×60% =54%; for the operation sequence set B, the probability that the set is an abnormal sequence set is determined according to the third mode, the aggregation confidence of the set is determined to be 80% according to the sequence timestamp mean square error index (i.e. the aggregation degree in time is higher), and further the probability that the set B is finally an abnormal sequence set can be determined to be 80% by 64%. And further, it can be determined that the probability that the set B is an abnormal sequence set is higher than the probability that the set a is an abnormal operation set.
According to the embodiment of the application, whether the corresponding set is the abnormal sequence set can be judged through the aggregation characteristic of the abnormal operation sequence in space and the aggregation characteristic in time, so that cheating brushing amount behaviors are accurately identified, and whether the object account is an abnormal object account is accurately judged.
In an optional embodiment, after determining the abnormal media resource from the media resource set associated with the abnormal object account, at least one of the following is further included:
firstly, the pushing probability of pushing abnormal media resources to the object account on the resource platform is reduced;
pushing prompt information to a target object account for publishing the abnormal media resource;
and thirdly, setting the abnormal media resource to be in an inaccessible state on the resource platform.
It will be appreciated that after identifying the abnormal media asset, the object account that published the abnormal media asset, may be correlated in a number of ways. For example, the popularization probability of abnormal media resources in a platform, namely, the current limiting treatment can be reduced; for another example, corresponding prompt information can be pushed to the object account with the abnormal media resource released, so as to indicate that the object account can obtain higher flow for the abnormal media resource by the violation operation, thereby realizing the warning of the cheating behavior of the object account; furthermore, the abnormal media resources can be set to be in an invisible state on the platform, so that recommendation and propagation of the abnormal media resources are reduced.
In yet another embodiment, a related "punishment" operation may also be performed on the abnormal object account (i.e., the account on which the swipe operation is performed), for example, the account behavior of the abnormal object account is limited, such as browsing, praying, collecting and commenting behaviors, and the account behavior of the abnormal object account may also be subjected to a reset process, for example, the behavior data of the browsing, praying, collecting and commenting behaviors performed on the platform by the abnormal object account is cleared; even, the blocking process may be performed on the abnormal object account, for example, the login operation of the abnormal object account is forbidden for a period of time. In the present embodiment, the operation type of the related management operation for the abnormal object account is not limited.
A complete embodiment of the present application is described below in conjunction with fig. 12.
S1202, acquiring an operation log of a candidate object account in a target period;
it should be noted that in this step, it is necessary to log the operations of the candidate account in the video application, and then sort the behavior operations of the user according to the occurrence time, so as to generate an operation behavior sequence. In the process of saving the operations as the log, according to the mapping relationship shown in fig. 4, the operation code and the operation timestamp of each operation may be recorded, so that the operation behavior time sequence (i.e. the above-mentioned reference resource operation sequence) of the candidate object account may be obtained, for example, the object account a opens the video APP at the point, and the following operations are performed: logging in, browsing, praise, comment, browsing, etc., then the corresponding generated operational behavior sequence is: 101. 104, 105, 106, 104 … …
S1204, dividing the operation behavior sequence of the candidate object account into a plurality of resource operation sequences in the form of n-gram according to the operation log;
in this embodiment, after the reference resource operation sequence of the candidate account is generated, the candidate account needs to be further divided based on the reference resource operation sequence of the candidate account to obtain a plurality of operation behavior segments (i.e. the plurality of resource operation sequences described above), where the operation segments may be defined as being composed of a plurality of operation behaviors. The operation behavior sequence may be segmented by using an n-gram. There are two ways of dividing, namely overlapping and non-overlapping. The overlap type of splitting is shown in fig. 9, and the non-overlap type of splitting is shown in fig. 10.
Optionally, when the length of the operation behavior sequence is not long, overlapping segmentation is preferably adopted, and the overlapping segmentation can be more sufficient for the segmentation of the operation behavior sequence.
Optionally, when the length of the operation behavior sequence is longer, non-overlapping segmentation is preferentially adopted, and the non-overlapping segmentation encounters a situation that the last operation behavior segment is not long enough, and then the previous operation behavior needs to be supplemented into the last operation behavior segment.
S1206, respectively acquiring sequence features of each resource operation sequence through a target feature extraction model;
it can be understood that in this embodiment, feature extraction can be performed on the sequence features of each operation sequence through the target feature extraction model, so that the features of similar sequence features in the feature space are closer, and further, the clustering result is improved; specifically, in the implementation step, a pre-trained Skip-gram model may be used to output sequence features (i.e., vector representations of sequences) of each resource operation sequence;
s1208, determining the sequence similarity between every two resource operation sequences according to the sequence characteristics;
in this implementation step, the sequence similarity between every two resource operation sequences may be characterized according to the feature distance between the sequence features of every two resource operation sequences.
S1210, performing clustering operation according to the sequence similarity between every two resource operation sequences to obtain a plurality of operation sequence sets; specifically, S1210 may further include S1210-1, S1210-2 and S1210-3. S1210-1, each sequence is used as a single cluster, and the distance between every two clusters is determined through a target distance algorithm; s1210-2, merging two clusters closest to each other to form a cluster with the smallest average connection; s1210-3, determine that the distance between all clusters is greater than the merge threshold? If the distance between the two clusters is less than or equal to the merging distance, step S1210-2 is repeated, and if the distance between all clusters is greater than the merging threshold, the end of the cluster is determined. An exemplary clustering result is shown in FIG. 13, with aggregated similarities c1, c2, and c3, with the remainder being offline small clusters.
S1212, traversing each operation sequence set to judge whether the operation sequence set is an abnormal sequence set;
it will be appreciated that, as can be seen from the clustering result obtained in S1210, the aggregated operational behavior segments are more suspicious than they are, so that they can be preferentially verified for aggregated clusters, such as c1, c2, and c 3. Steps of performing verification, such as S1212-1 and S1212-2, obtain spatial feature indexes and temporal feature indexes of the current sequence set; judging whether the current sequence set is an abnormal sequence set or not according to the space characteristic index and the time characteristic index;
specifically, the spatial feature index may include, but is not limited to, a similarity between a cluster feature and a sequence feature of the abnormal operation sequence; the sequence included in the cluster is the duty ratio of the abnormal operation sequence; scale of clustering and other indexes; the temporal characteristics may include the mean square error of the operation timestamps corresponding to each of the sequences included in the clusters. And judging whether each set is an abnormal sequence set or not according to the space characteristic index and time characteristic index pairs.
In another alternative, S1212 may also use the operation behavior segments in the sets as suspicious segments, and then sample and verify the suspicious segments, and finally determine abnormal operation behavior segments.
S1214, determining the candidate object account as an abnormal object account; it can be appreciated that, after the abnormal operation behavior segment is determined, for the object account with the higher aggregation degree of the abnormal operation behavior for hitting the abnormal operation behavior segment, the object account is listed as an abnormal object account.
S1216, acquiring a media resource set associated with the abnormal object account;
specifically, the association relationship may include, but is not limited to, an operation association relationship such as a media resource that is endorsed by the abnormal object account, a forwarded media resource, a collected media resource, and the like.
S1218, determining abnormal media resources from the media resource set; specifically, the step S1218 may further include steps S1218-1, S1218-2, and S1218-3, that is, determining a candidate object account from the object account set associated with the media resource set; S1202-S1214 are repeated to determine an abnormal object account from the candidate object accounts; determining abnormal media resources according to intersections of media resource sets associated with the plurality of abnormal object accounts;
it can be understood that after the abnormal object account is determined, the object accounts with comments and praise of the same media resources are gathered together according to the comments and praise media resources, and the comments and praise object accounts are listed as an abnormal object account set. And finally, performing multiple browsing, collection, comment, praise and other works on the abnormal object account set, wherein the works are classified as abnormal media resources.
S1220, a management operation is performed on the abnormal media resource. For example, the abnormal media resources may be subject to a current limiting operation, reducing the chance that they are recommended and propagated at the platform.
According to the embodiment of the application, the abnormal operation behavior fragments are clustered based on the aggregations of the operation behaviors of the cheating object account set, and then the object account and the account set with abnormal operation are mined. From the perspective of a digging brushing amount executor, after an abnormal object account set is mined, work management operation is performed on brushing amounts in batches, so that video works can be detected and rapidly limited after the video works are brushed. The abnormal account sets are continuously mined and managed in an enhanced mode, the cheating cost of the abnormal account sets is continuously increased, cheating media resources are further reduced, and the resource quality of the video platform is improved.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
According to another aspect of the embodiment of the present invention, there is also provided a device for detecting an abnormal media resource for implementing the method for detecting an abnormal media resource. As shown in fig. 14, the apparatus includes:
an obtaining unit 1402, configured to obtain a plurality of resource operation sequences of a candidate object account, where the resource operation sequences include a plurality of continuous resource operations performed on a media resource by the candidate object account on a resource platform;
a clustering unit 1404, configured to perform clustering operation on a plurality of the resource operation sequences according to the sequence similarity between the resource operation sequences, to obtain at least one operation sequence set;
a determining unit 1406 configured to determine, when at least one of the operation sequence sets includes an abnormal sequence set, the candidate object account as an abnormal object account, where the sequence similarity between the resource operation sequence and the abnormal operation sequence included in the abnormal sequence set is greater than or equal to a target threshold;
the detecting unit 1408 is configured to determine an abnormal media resource from a media resource set associated with the abnormal object account, where the media resource set includes media resources corresponding to the resource operation performed by the abnormal object account.
Optionally, the clustering unit 1404 includes: the clustering module is used for respectively determining a plurality of resource operation sequences as a candidate operation sequence set, and repeating the following steps until the set similarity between every two candidate operation sequence sets is smaller than a reference threshold value: respectively determining the set similarity between every two candidate operation sequence sets according to the sequence similarity, and clustering the two operation sequence sets with the set similarity meeting a clustering condition to obtain an updated candidate operation sequence set; and acquiring the set similarity between every two candidate operation sequence sets according to the updated candidate operation sequence sets and the rest candidate operation sequence sets.
Optionally, the clustering module is further configured to one of: under the condition that the set similarity between two reference operation sequence sets is higher than the set similarity between every two other candidate operation sequence sets, determining that the set similarity meets the clustering condition; and under the condition that the set similarity between the two reference operation sequence sets is higher than a similarity threshold, determining that the set similarity meets the clustering condition.
Optionally, the clustering module is configured to: under the condition that each of the two obtained candidate operation sequence sets comprises one resource operation sequence, taking the sequence similarity between the two resource operation sequences as the set similarity; under the condition that each of the two obtained candidate operation sequence sets comprises a plurality of resource operation sequences, determining set features of the candidate operation sequence sets according to sequence features of each of the plurality of resource operation sequences in the candidate operation sequence sets, and determining feature distances between the set features of each of the two candidate operation sequence sets as the set similarity; when a first candidate operation sequence set of the two obtained candidate operation sequence sets comprises one resource operation sequence and a second candidate operation sequence of the two candidate operation sequence sets comprises a plurality of resource operation sequences, determining the set characteristics of the second candidate operation sequence according to the sequence characteristics of each of the plurality of resource operation sequences included in the second candidate operation sequence; and acquiring the sequence features of the resource operation sequences included in the first candidate operation sequence set, and determining a feature distance between the set features and the sequence features as the set similarity.
Optionally, the above clustering module is further configured to: respectively acquiring sequence characteristics of each of a plurality of resource operation sequences through a target characteristic extraction model; and determining the feature distance between every two sequence features in turn, and determining the feature distance as the sequence similarity between the two corresponding resource operation sequences.
Optionally, the clustering module is configured to: obtaining an operation log of the candidate object account in a target period, wherein the operation log comprises an operation record of the resource operation, which is executed by the candidate object account on the resource platform, of the media resource; generating a reference resource operation sequence of the candidate object account in the target period according to the operation records included in the operation log and the time sequence; and dividing the reference resource operation sequence according to a target length to obtain a plurality of resource operation sequences, wherein each resource operation sequence in the plurality of resource operation sequences comprises the same number of resource operations.
Optionally, the device for detecting abnormal media resources further includes one of the following: a first obtaining unit configured to obtain a set feature of a third operation sequence set from the at least one operation sequence set, and determine that the third operation sequence set is the abnormal sequence set when a feature similarity between the set feature and a sequence feature of the abnormal operation sequence is greater than or equal to a first threshold; a second obtaining unit, configured to obtain a plurality of third resource operation sequences included in a third operation sequence set in the at least one operation sequence set, and determine that the third operation sequence set is the abnormal sequence set when a proportion of the abnormal operation sequences included in the plurality of third resource operation sequences is greater than or equal to a second threshold.
Optionally, the detection unit includes at least one of: the first detection module is used for acquiring a media resource set associated with the abnormal object account, and respectively extracting the resource operation characteristics of each media resource in the media resource set, wherein the resource operation characteristics are used for indicating the respective corresponding change trend of a plurality of resource indexes of the media resource; determining the abnormal media resources from the media resource set according to the resource operation characteristics; and the second detection module is used for respectively acquiring the media resource sets respectively associated with the abnormal object account numbers and determining the media resources included in the intersection of the media resource sets as the abnormal media resources.
Optionally, the detecting device for abnormal media resources further includes at least one of the following: the pushing unit is used for reducing the pushing probability of pushing the abnormal media resources to the object account on the resource platform; the prompting unit is used for pushing prompting information to the target object account for issuing the abnormal media resources; and the setting unit is used for setting the abnormal media resource to be in an inaccessible state on the resource platform.
Optionally, the second detection module is further configured to: acquiring a first media resource set associated with a first abnormal object account; traversing each first media resource in the first media resource set, and acquiring an object account set corresponding to each first media resource, wherein the object account set comprises a plurality of object accounts for executing the resource operation on the first media resource; and determining a second abnormal object account from the object account set.
Alternatively, in this embodiment, the embodiments to be implemented by each unit module may refer to the embodiments of each method described above, which are not described herein again.
According to still another aspect of the embodiment of the present invention, there is further provided an electronic device for implementing the method for detecting an abnormal media resource as described above, where the electronic device may be a terminal device or a server as shown in fig. 15. The present embodiment is described taking the electronic device as a terminal device as an example. As shown in fig. 15, the electronic device comprises a memory 1502 and a processor 1504, the memory 1502 having stored therein a computer program, the processor 1504 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.
Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, acquiring a plurality of resource operation sequences of candidate object accounts, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on a media resource by the candidate object accounts on a resource platform;
s2, performing clustering operation on a plurality of resource operation sequences according to the sequence similarity among the resource operation sequences to obtain at least one operation sequence set;
s3, determining the candidate object account as an abnormal object account when at least one operation sequence set comprises an abnormal sequence set, wherein the sequence similarity between the resource operation sequence and the abnormal operation sequence which are included in the abnormal sequence set is greater than or equal to a target threshold;
s4, determining abnormal media resources from a media resource set associated with the abnormal object account, wherein the media resource set comprises media resources corresponding to the resource operation executed by the abnormal object account.
Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 15 is only schematic, and the electronic device may also be a vehicle-mounted terminal, a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palm computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 15 is not limited to the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 15, or have a different configuration than shown in FIG. 15.
The memory 1502 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for detecting abnormal media resources in the embodiment of the present invention, and the processor 1504 executes the software programs and modules stored in the memory 1502 to perform various functional applications and data processing, that is, to implement the method for detecting abnormal media resources. The memory 1502 may include high-speed random access memory, but may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1502 may further include memory located remotely from the processor 1504, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1502 may be used to store, but is not limited to, file information such as a target logical file. As an example, as shown in fig. 15, the memory 1502 may include, but is not limited to, an acquisition unit 1402, a clustering unit 1404, a determination unit 1406, and a detection unit 1408 in a detection apparatus including the abnormal media resources. In addition, other module units in the device for detecting abnormal media resources may be included, but are not limited to, and are not described in detail in this example.
Optionally, the transmission device 1506 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 1506 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1506 is a Radio Frequency (RF) module that is configured to communicate wirelessly with the internet.
In addition, the electronic device further includes: a display 1508, and a connection bus 1510 for connecting the respective module components in the above-mentioned electronic device.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, the nodes may form a Peer-To-Peer (Peer To Peer) network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the Peer-To-Peer network.
According to one aspect of the present application, a computer program product is provided, comprising a computer program/instructions containing program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. When executed by a central processing unit, performs the various functions provided by the embodiments of the present application.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
According to one aspect of the present application, there is provided a computer-readable storage medium, from which a processor of a computer device reads the computer instructions, the processor executing the computer instructions, so that the computer device performs the above-mentioned method of detecting an abnormal media resource.
Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for performing the steps of:
s1, acquiring a plurality of resource operation sequences of candidate object accounts, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on a media resource by the candidate object accounts on a resource platform;
S2, performing clustering operation on a plurality of resource operation sequences according to the sequence similarity among the resource operation sequences to obtain at least one operation sequence set;
s3, determining the candidate object account as an abnormal object account when at least one operation sequence set comprises an abnormal sequence set, wherein the sequence similarity between the resource operation sequence and the abnormal operation sequence which are included in the abnormal sequence set is greater than or equal to a target threshold;
s4, determining abnormal media resources from a media resource set associated with the abnormal object account, wherein the media resource set comprises media resources corresponding to the resource operation executed by the abnormal object account.
Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the above-described method of the various embodiments of the present invention.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the above, is merely a logical function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.
Claims (14)
1. A method for detecting an abnormal media asset, comprising:
acquiring a plurality of resource operation sequences of a candidate object account, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on a media resource by the candidate object account on a resource platform;
Respectively obtaining sequence similarity between every two resource operation sequences in a plurality of resource operation sequences, and executing clustering operation on the plurality of resource operation sequences according to the sequence similarity to obtain at least one operation sequence set;
determining the candidate object account as an abnormal object account under the condition that at least one operation sequence set comprises an abnormal sequence set, wherein the sequence similarity between the resource operation sequence and the abnormal operation sequence included in the abnormal sequence set is greater than or equal to a target threshold;
and determining abnormal media resources from a media resource set associated with the abnormal object account, wherein the media resource set comprises media resources corresponding to the resource operation executed by the abnormal object account.
2. The method of claim 1, wherein performing a clustering operation on a plurality of the resource operation sequences according to the sequence similarity to obtain at least one operation sequence set comprises:
respectively determining a plurality of resource operation sequences as a candidate operation sequence set, and repeating the following steps until the set similarity between every two candidate operation sequence sets is smaller than a reference threshold value:
Respectively determining the set similarity between every two candidate operation sequence sets according to the sequence similarity, and clustering the two operation sequence sets with the set similarity meeting a clustering condition to obtain an updated candidate operation sequence set;
and acquiring the set similarity between every two candidate operation sequence sets according to the updated candidate operation sequence sets and the rest candidate operation sequence sets.
3. The method according to claim 2, further comprising, before the clustering operation of the two operation sequence sets whose set similarity satisfies a clustering condition, one of:
determining that the set similarity satisfies the clustering condition when the set similarity between two reference operation sequence sets is higher than the set similarity between every two other candidate operation sequence sets;
and determining that the set similarity meets the clustering condition under the condition that the set similarity between two reference operation sequence sets is higher than a similarity threshold value.
4. The method of claim 2, wherein said determining the set similarity between each two sets of candidate sequences of operations based on the sequence similarity comprises:
Under the condition that one resource operation sequence is respectively included in the two obtained candidate operation sequence sets, taking the sequence similarity between the two resource operation sequences as the set similarity;
under the condition that each of the two obtained candidate operation sequence sets comprises a plurality of resource operation sequences, determining set features of the candidate operation sequence sets according to sequence features of each of the plurality of resource operation sequences included in the candidate operation sequence sets, and determining feature distances between the set features of each of the two candidate operation sequence sets as the set similarity;
determining the set characteristics of a second candidate operation sequence according to the sequence characteristics of each of a plurality of resource operation sequences included in the second candidate operation sequence under the condition that a first candidate operation sequence set of the two obtained candidate operation sequence sets comprises one resource operation sequence and a second candidate operation sequence of the two candidate operation sequence sets comprises a plurality of resource operation sequences; and acquiring the sequence features of the resource operation sequences included in the first candidate operation sequence set, and determining feature distances between the set features and the sequence features as the set similarity.
5. The method of claim 2, wherein before determining the plurality of resource operation sequences as a candidate operation sequence set, further comprising:
respectively acquiring sequence characteristics of each resource operation sequence in a plurality of resource operation sequences through a target characteristic extraction model;
and determining feature distances between every two sequence features in sequence, and determining the feature distances as the sequence similarity between the two corresponding resource operation sequences.
6. The method of claim 2, wherein the obtaining the plurality of resource operation sequences of the candidate account number comprises:
obtaining an operation log of the candidate object account in a target period, wherein the operation log comprises an operation record of the resource operation, executed by the candidate object account on the resource platform, on the media resource;
generating a reference resource operation sequence of the candidate object account in the target period according to the operation record included in the operation log and the time sequence;
and dividing the reference resource operation sequence according to a target length to obtain a plurality of resource operation sequences, wherein each resource operation sequence in the plurality of resource operation sequences comprises the same number of resource operations.
7. The method according to claim 1, wherein, in the case where the at least one operation sequence set includes an abnormal sequence set, before determining the candidate object account as an abnormal object account, one of:
acquiring set features of a third operation sequence set in the at least one operation sequence set, and determining the third operation sequence set as the abnormal sequence set under the condition that feature similarity between the set features and sequence features of the abnormal operation sequence is greater than or equal to a first threshold;
acquiring a plurality of third resource operation sequences included in a third operation sequence set in the at least one operation sequence set, and determining that the third operation sequence set is the abnormal sequence set when the proportion of the abnormal operation sequences included in the plurality of third resource operation sequences is greater than or equal to a second threshold value.
8. The method of claim 1, wherein the determining an abnormal media resource from the set of media resources associated with the abnormal object account comprises at least one of:
acquiring a media resource set associated with the abnormal object account, and respectively extracting resource operation characteristics of each media resource in the media resource set, wherein the resource operation characteristics are used for indicating respective corresponding change trends of a plurality of resource indexes of the media resource; determining the abnormal media resources from the media resource set according to the resource operation characteristics;
And respectively acquiring the media resource sets respectively associated with the abnormal object accounts, and determining the media resources included in the intersections of the media resource sets as the abnormal media resources.
9. The method of claim 8, wherein after determining an abnormal media resource from the set of media resources associated with the abnormal object account, further comprising at least one of:
the pushing probability of pushing the abnormal media resource to the object account on the resource platform is reduced;
pushing prompt information to a target object account for releasing the abnormal media resource;
setting the abnormal media resource to an inaccessible state on the resource platform.
10. The method of claim 8, wherein the obtaining the media resource sets associated with the abnormal object account numbers, respectively, before determining the media resource included in the intersection of the media resource sets as the abnormal media resource, further comprises:
acquiring a first media resource set associated with a first abnormal object account;
traversing each first media resource in the first media resource set, and acquiring an object account set corresponding to each first media resource, wherein the object account set comprises a plurality of object accounts for executing the resource operation on the first media resource;
And determining a second abnormal object account from the object account set.
11. A device for detecting an abnormal media asset, comprising:
the acquisition unit is used for acquiring a plurality of resource operation sequences of the candidate object account, wherein the resource operation sequences comprise a plurality of continuous resource operations which are executed on the media resource by the candidate object account on a resource platform;
a clustering unit, configured to perform clustering operation on a plurality of resource operation sequences according to sequence similarity between the resource operation sequences, so as to obtain at least one operation sequence set;
a determining unit, configured to determine, when at least one of the operation sequence sets includes an abnormal sequence set, the candidate object account as an abnormal object account, where the sequence similarity between the resource operation sequence and the abnormal operation sequence included in the abnormal sequence set is greater than or equal to a target threshold;
the detection unit is used for determining abnormal media resources from a media resource set associated with the abnormal object account, wherein the media resource set comprises media resources corresponding to the resource operation executed by the abnormal object account.
12. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the method of any one of claims 1 to 10.
13. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 10.
14. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 10 by means of the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311205268.4A CN117294873A (en) | 2023-09-18 | 2023-09-18 | Abnormal media resource detection method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311205268.4A CN117294873A (en) | 2023-09-18 | 2023-09-18 | Abnormal media resource detection method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117294873A true CN117294873A (en) | 2023-12-26 |
Family
ID=89238259
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311205268.4A Pending CN117294873A (en) | 2023-09-18 | 2023-09-18 | Abnormal media resource detection method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117294873A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117793464A (en) * | 2023-12-27 | 2024-03-29 | 北京新联财通咨询有限公司 | Interactive data processing method and device for video works, storage medium and terminal |
-
2023
- 2023-09-18 CN CN202311205268.4A patent/CN117294873A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117793464A (en) * | 2023-12-27 | 2024-03-29 | 北京新联财通咨询有限公司 | Interactive data processing method and device for video works, storage medium and terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4242878A1 (en) | Method and apparatus for training isolation forest, and method and apparatus for recognizing web crawler | |
US9288124B1 (en) | Systems and methods of classifying sessions | |
CN103793484B (en) | The fraud identifying system based on machine learning in classification information website | |
CN107862022B (en) | Culture resource recommendation system | |
CN107153656B (en) | Information searching method and device | |
CN103530365A (en) | Method and system for acquiring downloading link of resources | |
CN112751711B (en) | Alarm information processing method and device, storage medium and electronic equipment | |
CN106327230B (en) | Abnormal user detection method and equipment | |
CN113271322B (en) | Abnormal flow detection method and device, electronic equipment and storage medium | |
CN106339507A (en) | Method and device for pushing streaming media message | |
US20220156795A1 (en) | Segment content optimization delivery system and method | |
CN109063736B (en) | Data classification method and device, electronic equipment and computer readable storage medium | |
CN110910204A (en) | User monitoring system based on artificial intelligence | |
CN114187036A (en) | Internet advertisement intelligent recommendation management system based on behavior characteristic recognition | |
CN117294873A (en) | Abnormal media resource detection method and device, storage medium and electronic equipment | |
CN112100221A (en) | Information recommendation method and device, recommendation server and storage medium | |
CN112437034B (en) | False terminal detection method and device, storage medium and electronic device | |
CN106294406A (en) | A kind of method and apparatus accessing data for processing application | |
CN109062945B (en) | Information recommendation method, device and system for social network | |
CN114780606A (en) | Big data mining method and system | |
CN114245185A (en) | Video recommendation method, model training method, device, electronic equipment and medium | |
CN110708296B (en) | VPN account number collapse intelligent detection model based on long-time behavior analysis | |
CN113283484A (en) | Improved feature selection method, device and storage medium | |
Liang et al. | Predicting network response times using social information | |
CN113570409B (en) | Determination method and device for conversion event weight value, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |