Disclosure of Invention
The embodiments of the present disclosure are directed to providing a more efficient scheme for gray-scale publishing new products, so as to solve the deficiencies in the prior art.
To achieve the above object, one aspect of the present specification provides a method of gray-scale distributing a new product in a system, including: the method comprises the steps of obtaining a plurality of first products in a system and a plurality of keywords included by the first products, wherein the first products include new products to be released; acquiring the weight of each keyword of the new product based on the plurality of first products and a plurality of keywords included in the plurality of first products; acquiring a keyword vector of a new product according to the weight; calculating the similarity between the keyword vector of the user acquired in advance and the keyword vector of the new product; and in the case that the similarity is larger than a preset threshold value, determining that the user is a target user using the new product, and releasing the new product to the target user.
In one embodiment, in the method of greyscale publishing a new product in a system, the new product is an updated version of an old version product, and determining that the user is a target user for using the new product comprises causing the user to use the new product when the user enters the old version product.
In one embodiment, in the method for releasing a new product in a gray scale in a system, the plurality of keywords included in the plurality of first products are a plurality of keywords included in texts included in the plurality of first products.
In one embodiment, in the method for gray-scale launching of a new product in a system, obtaining the weight of each keyword of the new product comprises obtaining the weight of each keyword of the new product through a TF-IDF algorithm.
In one embodiment, in the method for releasing a new product in a gray scale in a system, the pre-obtained keyword vector of the user is obtained by the following steps: acquiring a plurality of second products in a system and a plurality of keywords included by the second products, wherein the second products are open to use relative to a user; acquiring preference data of a user about the plurality of second products; dividing the plurality of second products into a positive sample set and a negative sample set according to the preference data; acquiring the weight of each keyword of each second product based on the plurality of second products and a plurality of keywords included in the plurality of second products; acquiring a keyword vector of each second product according to the weight of each keyword of each second product; and calculating by a Rocchio algorithm according to the positive sample set, the negative sample set and the keyword vector of each second product to obtain the keyword vector of the user.
In one embodiment, in the method of gray-distributing a new product in a system, the plurality of second products are a plurality of products included in the plurality of first products.
In one embodiment, in the method of greyscale distributing new products in a system, the preference data comprises at least one of the following data: frequency of use of the product by the user, rating of the product by the user, and number of recent uses of the product by the user.
Another aspect of the present invention provides an apparatus for gray-scale distributing a new product in a system, comprising: the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is configured to acquire a plurality of first products in the system and a plurality of keywords included by the first products, and the first products include new products to be released; a second obtaining unit configured to obtain a weight of each keyword of the new product based on the plurality of first products and a plurality of keywords included in the plurality of first products; a third obtaining unit configured to obtain a keyword vector of a new product according to the weight; the calculating unit is configured to calculate the similarity between the keyword vector of the user acquired in advance and the keyword vector of the new product; and a determining unit that determines that the user is a target user who uses the new product and issues the new product to the target user, in a case where the similarity is greater than a predetermined threshold.
In one embodiment, in the apparatus for gray releasing a new product in a system, the new product is an updated version of an old version product, and the determining unit is further configured to cause a user to use the new product when the user enters the old version product.
In one embodiment, in the apparatus for releasing a new product in a system in a gray scale, the second obtaining unit is further configured to obtain the weight of each keyword of the new product through a TF-IDF algorithm.
In one embodiment, the apparatus for releasing a new product in a grayscale system further includes a fourth obtaining unit configured to obtain a keyword vector of a user in advance, where the fourth obtaining unit specifically includes: the first acquisition subunit is configured to acquire a plurality of second products in the system and a plurality of keywords included in the plurality of second products, wherein the second products are open to use relative to a user; a second acquiring subunit configured to acquire preference data of the user regarding the plurality of second products; the dividing unit is configured to divide the plurality of second products into a positive sample set and a negative sample set according to the preference data; a third obtaining subunit configured to obtain, based on the plurality of second products and a plurality of keywords included in the plurality of second products, a weight of each keyword of each second product; a fourth obtaining subunit, configured to obtain, according to the weight of each keyword of each second product, a keyword vector of each second product; and the calculating subunit is configured to calculate through a Rocchio algorithm according to the positive sample set, the negative sample set and the keyword vector of each second product to obtain the keyword vector of the user.
According to the scheme for issuing the new product in the system in the gray scale, active users can be effectively hit in the process of issuing the new product in the gray scale, so that the gray scale rhythm can be effectively controlled, the gray scale effect is guaranteed, and the problems existing in the new product can be timely collected and solved.
Detailed Description
The embodiments of the present specification will be described below with reference to the accompanying drawings.
FIG. 1 shows a schematic diagram of a system 100 according to embodiments herein. As shown in fig. 1, the system 100 includes a product vector acquisition unit 11, a user vector acquisition unit 12, and a similarity calculation unit 13. In the product vector obtaining unit 11, a first product set and a keyword set included in the plurality of first products are obtained, where the plurality of first products include a new product to be released in a gray scale. Based on the plurality of first products and the keyword set, the weight of each keyword of a new product is obtained, and according to the weight, a keyword vector of the new product is obtained, so that the keyword vector of the new product is obtained, and the keyword vector of the new product is sent to the similarity calculation unit 13. Meanwhile, in the product vector acquisition unit 11, the second product set and the keyword set included therein, which are open for use by the user, are acquired, and similarly, the weight of each keyword of each second product can be acquired with respect to the second product set and the keyword set thereof, so that a keyword vector of each second product is acquired and sent to the user vector acquisition unit 12. In the user vector obtaining unit 12, preference data of the user for each second product is obtained, so that the second product set can be divided into a positive sample set and a negative sample set according to the preference data. By the rocchi algorithm, based on the positive sample vector set, the negative sample vector set, and the keyword vector of each second product, a keyword vector of the user can be calculated and sent to the similarity calculation unit 13. The similarity calculation unit 13 calculates the similarity between the keyword vector of the new product it receives and the user keyword vector, and in the case where the similarity is greater than a predetermined threshold, it can be determined that the user is a gradation target user who uses the new product.
FIG. 2 shows a flow diagram of a method for gray-publishing new products in a system, according to an embodiment of the present description. The method comprises the following steps: in step S21, a plurality of first products in the system and a plurality of keywords included in the plurality of first products are obtained, where the plurality of first products include a new product to be released; in step S22, obtaining weights of the keywords of the new product based on the first products and the keywords included in the first products; in step S23, obtaining a keyword vector of the new product according to the weight; in step S24, calculating the similarity between the keyword vector of the user in the system acquired in advance and the keyword vector of the new product; and in step S25, in the case that the similarity is greater than the predetermined threshold, determining that the user is a target user using the new product, and issuing the new product to the target user.
First, in step S21, a plurality of first products in the system and a plurality of keywords included in the plurality of first products are obtained, where the plurality of first products include a new product to be released. The system can be, for example, a website in the internet, an APP in a terminal device, such as a pay bank APP, and the like. The plurality of first products may be, for example, all existing products included in the APP and new products to be grayed out. For example, the plurality of first products includes N products, and the set of N products is:
D={d1,d2,…,dN}。
from each product, its respective plurality of keywords may be obtained. Keywords may be obtained from text included in the product. In one embodiment, the segmentation that can be keywords in the product text is determined by the product developer. In one embodiment, the segmentation words with larger information amount in the product text are selected as the key words of the product. The keyword sets T of the plurality of first products may be obtained by obtaining and aggregating a plurality of keywords for each product. For example, the keyword set T includes n keywords:
T={t1,t2,...,tn}。
in step S22, based on the first products and the keywords included in the first products, the key items of the new product are obtainedThe weight of the word. For example, the ith keyword in the keyword set T has a weight of w in the new product jij. The weight wijThe assignment may be made in a number of ways.
In one embodiment, assuming each keyword in a new product j is equally important, when the ith keyword is included in a new product j, then wijWhen the ith keyword is not included in the new product j, then wij0. For example, if the new product is footprint 2.0, which includes the keywords "Credit, Bill, action", e.g. it is the first, second and third keyword, respectively, in the set of keywords T, then for the new product j, w1j=1,w2j=1,w3j1 and wij0, wherein i is 4 to n.
In one embodiment, the weights of the various keywords of the new product are obtained by the TF-IDF algorithm shown in equations (1) and (2) below. Wherein:
TF-IDF(i,j)=TF(i,j)*IDF(i) (1),
wherein TF (i, j) in formula (1) is normalized word frequency of the keyword i in the product j, which is obtained by calculation of formula (3),
wherein n isi,jIs the number of times the keyword i appears in the product j.
N (i) in the formula (2) is the number of products in which the keyword i appears in the N products. By normalizing TF-IDF (i, j), w is obtainedijAs shown in the formula (4),
the weight of each keyword of the new product is obtained through the TF-IDF algorithm, and the weight of the keyword is determined through the number of times that the keyword appears in the product and the number of times that the keyword appears in other products (including other product numbers of the keyword), namely, the higher the frequency of the keyword in the product is, the larger the weight is, the larger the number of times that the keyword appears in other products is, the smaller the weight is, and thus the weight of each keyword of the product is more accurately defined.
In step S23, a keyword vector of the new product is obtained according to the weight. For example, for a new product j, after obtaining the respective weights of n keywords in the keyword set in the product j, the keyword vector of the new product may be obtained
Wherein
For example, for a new product with footprint 2.0 in step S22, a keyword vector of which may be obtained as
In step S24, the similarity between the keyword vector of the user acquired in advance and the keyword vector of the new product is calculated.
First, acquisition of a user keyword vector is explained with reference to fig. 3. Fig. 3 shows a flowchart of a method for obtaining a user keyword vector according to an embodiment of the present specification.
As shown in fig. 3, in step S31, a plurality of second products and a plurality of keywords included in the plurality of second products in the system are obtained, wherein the second products are open for use by the user. The plurality of second products may be, for example, all but new products in the APP. It is to be understood that the plurality of second products are not limited to the products belonging to the plurality of first products as long as they are open-ended for use in the face of the user and preference data for them by the user can be obtained.
In step S32, preference data of the user with respect to the plurality of second products is acquired. Wherein the preference data comprises at least one of: frequency of use of the product by the user, rating of the product by the user, and number of recent uses of the product by the user.
In step S33, the plurality of second products are divided into a positive sample set and a negative sample set according to the preference data. The positive sample set is a set of products that the user likes, and the negative sample set is a set of products that the user dislikes. In one embodiment, it is determined that the user likes a product when the user's frequency of use of the product (e.g., the number of uses per day, or the number of uses per week) is greater than a predetermined threshold. In one embodiment, it is determined that the user likes a product when the user scores the product more than a predetermined score. In one embodiment, the user is determined to like the product when the user has used the product more recently (e.g., within 2 days, within 3 days, within a week, etc.) than or equal to a predetermined number of times, for example, the user is determined to like the product when the user has used the product more than or equal to 1 time within 3 days. In one embodiment, the preference data items are considered together to determine the preference of the user for the product.
In step S34, obtaining a weight of each keyword of each second product based on the plurality of second products and the plurality of keywords included in the plurality of second products; in step S35, a keyword vector of each second product is obtained according to the weight of each keyword of each second product. Here, steps S34 and S35 are substantially the same as steps S22 and S23 described with reference to fig. 2, and are not described again.
In step S36, a keyword vector of the user is obtained through the Rocchio algorithm according to the positive sample set, the negative sample set and the keyword vector of each second product. Wherein, according to the Rocchio algorithm, a keyword vector of a user is obtained through the following formula (5)
Wherein I
rIs a set of positive samples, I
nrIn the form of a set of negative samples,
for a keyword vector belonging to a product in the positive sample set,
is the keyword vector belonging to the product in the negative sample set, and β and γ are the weights of the positive and negative sample sets, the size of which is determined by the system according to the distribution of the positive and negative samples.
The method for obtaining the keyword vector of the user described above with reference to fig. 3 may be performed periodically, for example, once per week, or once per day, so as to continuously update the keyword vector of the user according to the usage of the product in the APP by the user. The method for acquiring the user keyword vector can also be performed in real time when the user keyword vector needs to be used, for example, when the method for releasing a new product in grayscale according to the embodiment of the present specification is implemented, the keyword vector of the user is calculated in real time, so that a more accurate user vector can be provided.
Returning again to FIG. 2, the keyword vector of the user is obtained as described above
Thereafter, a keyword vector for the user may be calculated
Keyword vector with the new product
The similarity between them.
The similarity between two vectors can be calculated in a number of ways, such as euclidean distance, manhattan distance, minuscule distance, cosine similarity, etc. Preferably, the keyword vector of the user is calculated by the following cosine similarity formula (6)
Keyword vector with the new product
The similarity between them.
The similarity has a value range of [ -1,1 [)]When the value is closer to 1, the closer the two vectors are. In case the plurality of second product sets do not belong to the plurality of first product sets, i.e. the keyword vector of the user
Keyword vector with the new product
The feature dimensions between the vectors may be different, and the similarity between the two vectors may be calculated by complementing the missing feature dimensions of the vectors by 0.
In step S25, in the case that the similarity is greater than the predetermined threshold, it is determined that the user is a target user who uses the new product, and the new product is released to the target user. For example, in the case of passing the above cosine similarity, the predetermined threshold value may be set to 0.9. When the similarity is greater than the predetermined threshold value of 0.9, the coincidence rate of the keyword vector representing the user and the keyword vector of the new product is high, and the probability that the user likes the new product is high, that is, the probability that the user uses the new product is high. Thus, determining that the user is a target user to use a new product, may show the new product to the user in the APP, may invite the user to use the new product by way of notification, and so on. Therefore, active users can be effectively hit in the process of releasing new products in the gray scale, so that problems can be rapidly collected and solved.
In one embodiment, the new product is an updated version of an old version product, and the similarity is greater than a predetermined thresholdValue, when a user enters the old version of the product, the user is caused to use the new product. Therefore, the users with high use probability of the new products are automatically shunted to the new products only when entering the products of the old version, and the hit rate of the users is further improved. In one example, the APP will launch the new product "footprint 2.0" in grayscale, which is an updated version of "footprint 1.0". The keywords included in the "footprint 2.0" text are "credit, bill, action". From the plurality of products included in the APP and the keywords included therein, a keyword vector (1,1,1,0, … 0) for the new product is obtainedT. In the case where the user has recently entered footprint 1.0, from this preference data, it may be determined that footprint 1.0 is a favorite product of the user, footprint 1.0 including keywords: credit, bill, action. Therefore, according to the above formula (4), the keyword vector of the user includes (1,1,1,0, … 0) multiplied by the weightT. It can be determined through formula (5) that the similarity between the keyword vector of the user and the keyword vector of the new product is greater than a predetermined threshold, and the user is determined to be the target user of the new product. Thus, the user is made to use the footprint 2.0 when the user enters the footprint again.
In one embodiment, the method for gray-scale publishing a new product in a system according to the embodiments of the present specification may be performed periodically, for example, once a day, to determine whether a user is converted into a target user of the new product according to the updated user keyword vector. For example, as described in the above example, in the case where a new product is an updated version of an old version product, in the case where the similarity of the user keyword vector and the new product keyword vector is less than a predetermined threshold value, the user is not a target user of the new product. Thus, when a user enters an old version product, the old version product is still used. But after the user enters the old version product recently, the system determines the old version product as the product liked by the user according to the preference data used recently, and updates the keyword vector of the user. Thus, when the system performs the method of graying out a new product in the system according to the embodiments of the present specification again, and uses the updated user keyword vector therein, it may be determined that the user has transitioned to a target user of the new product. Thus, when the user again enters an old version product, the user is enabled to use the new product.
Fig. 4 shows an apparatus 400 for gray-out of new products in a system. The apparatus 400 comprises: a first obtaining unit 41, configured to obtain a plurality of first products in a system and a plurality of keywords included in the plurality of first products, where the plurality of first products include a new product to be released; a second obtaining unit 42 configured to obtain a weight of each keyword of the new product based on the plurality of first products and a plurality of keywords included in the plurality of first products; a third obtaining unit 43, configured to obtain a keyword vector of a new product according to the weight; a calculating unit 44 configured to calculate similarity between the keyword vector of the user in the system acquired in advance and the keyword vector of the new product; and a determination unit 45 that determines that the user is a target user who uses the new product and issues the new product to the target user in a case where the similarity is greater than a predetermined threshold.
In an embodiment, the new product is an updated version of an old version product, and the determining unit is further configured to cause a user to use the new product when the user enters the old version product.
In one embodiment, the second obtaining unit is further configured to obtain the weight of each keyword of the new product through a TF-IDF algorithm.
In one embodiment, the apparatus 400 further includes a fourth obtaining unit 46 configured to obtain the keyword vector of the user in advance. The fourth obtaining unit 46 specifically includes: a first obtaining subunit 461, configured to obtain a plurality of second products in the system and a plurality of keywords included in the plurality of second products, wherein the second products are open for use with respect to the user; a second obtaining subunit 462, configured to obtain preference degree data of the user regarding the plurality of second products; a dividing subunit 463 configured to divide the plurality of second products into a positive sample set and a negative sample set according to the preference data; a third obtaining subunit 464, configured to obtain, based on the plurality of second products and a plurality of keywords included in the plurality of second products, a weight of each keyword of each second product; a fourth obtaining subunit 465, configured to obtain the keyword vector of each second product according to the weight of each keyword of each second product; and a calculating subunit 466 configured to calculate, according to the positive sample set, the negative sample set, and the keyword vector of each second product, by using the rocchi algorithm, to obtain the keyword vector of the user.
According to the scheme for issuing the new product in the system in the gray scale, active users can be effectively hit in the process of issuing the new product in the gray scale, so that the gray scale rhythm can be effectively controlled, the gray scale effect is guaranteed, and the problems existing in the new product can be timely collected and solved.
It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether these functions are performed in hardware or software depends on the particular application of the solution and design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.