WO2023061276A1 - Data recommendation method and apparatus, electronic device, and storage medium - Google Patents
Data recommendation method and apparatus, electronic device, and storage medium Download PDFInfo
- Publication number
- WO2023061276A1 WO2023061276A1 PCT/CN2022/124028 CN2022124028W WO2023061276A1 WO 2023061276 A1 WO2023061276 A1 WO 2023061276A1 CN 2022124028 W CN2022124028 W CN 2022124028W WO 2023061276 A1 WO2023061276 A1 WO 2023061276A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- feature vector
- data
- vector
- target data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 239000013598 vector Substances 0.000 claims abstract description 204
- 238000013136 deep learning model Methods 0.000 claims abstract description 28
- 230000006399 behavior Effects 0.000 claims description 30
- 238000012545 processing Methods 0.000 claims description 15
- 238000012015 optical character recognition Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 claims description 8
- 238000010845 search algorithm Methods 0.000 claims description 7
- 230000001174 ascending effect Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 description 19
- 238000010586 diagram Methods 0.000 description 12
- 238000013507 mapping Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present application belongs to the technical field of the Internet, and in particular relates to a data recommendation method, device, electronic equipment and storage medium.
- the purpose of the embodiment of the present application is a data recommendation method, device, electronic device and storage medium, which can solve the problem that users need to locate data through cumbersome operations, thereby reducing the convenience of data uploading.
- the embodiment of the present application provides a data recommendation method, which includes:
- the word vector is input into the deep learning model, and the target data matched with the word vector is determined;
- the embodiment of the present application provides a data recommendation device, which includes:
- the first determining module is used to determine the word vector corresponding to the target page when the target page is displayed;
- the second determination module is used to input the word vector into the deep learning model, and determine the target data matched with the word vector;
- the first display module is used to display the target data.
- an embodiment of the present application provides an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored in the memory and operable on the processor, and the program or instruction is The processor implements the steps of the method described in the first aspect when executed.
- an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented .
- the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions, so as to implement the first aspect the method described.
- the word vector corresponding to the target page is determined; the word vector is input into the deep learning model, and the target data matching the word vector is determined; and the target data is displayed.
- the user submits business-related data online, the user does not need to locate the data through cumbersome operations, but directly displays the target data when the target page is displayed, reducing the user's operations of locating the data Steps to recommend relevant data to users, thereby improving the convenience of data upload.
- Fig. 1 is a schematic diagram of the deep learning model provided by the embodiment of the present application.
- FIG. 2 is a schematic diagram of a file analysis model provided by an embodiment of the present application.
- FIG. 3 is a schematic diagram of a user behavior model provided by an embodiment of the present application.
- FIG. 4 is a flowchart of a data recommendation method provided in an embodiment of the present application.
- Fig. 5 is one of the application scene diagrams of the data recommendation method provided by the embodiment of the present application.
- FIG. 6 is the second application scenario diagram of the data recommendation method provided by the embodiment of the present application.
- FIG. 7 is a schematic flowchart of a data recommendation method provided in an embodiment of the present application.
- FIG. 8 is a structural diagram of a data recommendation device provided by an embodiment of the present application.
- FIG. 9 is a structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 10 is a hardware structural diagram of an electronic device provided by an embodiment of the present application.
- Fig. 1 is the schematic diagram of the deep learning model that the embodiment of the present application provides, as shown in Fig. 1, above-mentioned deep learning model can be deep semantic matching model (Deep Structured Semantic Models, DSSM) model, generally speaking,
- the DSSM model is obtained from two network models in a federated learning environment, where federated learning is a machine learning technique.
- the DSSM model is also called the twin-tower model, wherein one network model can be a file analysis model, and the other network model can be a user behavior model.
- one of the network models in the DSSM model is the file analysis model.
- the training process of the file analysis model is: obtain the file type and keywords of the training file, convert the above file type and keywords into word vectors, and the above word vectors are also called semantic embedding vectors, where the semantic embedding can be obtained by using word embedding technology vector.
- the semantic embedding vector is used as a training sample of the document analysis model, so that the document analysis model outputs a document feature vector. It should be understood that the document feature vector is associated with the training document.
- the security level of the training file can be determined based on the file type and keyword, and the privacy file with a higher security level can be determined.
- the training file and the corresponding file feature vector are stored in a preset mapping table, that is, the mapping table stores the mapping relationship between the file feature vector and the training file.
- one of the network models in the DSSM model is the user behavior model.
- the training process of the user behavior model is: use relevant information that can represent user behavior as training information, such as the semantic embedding vector corresponding to the displayed page, user operation data, and user uploaded files, etc.; use these training information as the training of the user behavior model samples, so that the user behavior model outputs a behavior feature vector.
- the behavior feature vector is used to characterize the user's operation behavior, and the operation behavior includes the user's preference for file selection in a specific scenario.
- the behavior feature vector and the file feature vector can be stored in the preset database, and based on the correlation between the behavior feature vector and the file feature vector, the storage location of the behavior feature vector and the file feature vector can be adjusted storage location.
- the input of the document analysis model is an identity document
- the output is the document feature vector corresponding to the identity document
- the input of the user behavior model includes Identity files and user operation data
- the behavior feature vector output by the user behavior model and the file feature vector corresponding to the identity file can be stored in the database, and the storage location of the behavior feature vector and the storage of the file feature vector can be adjusted in the database position to reduce the Euclidean distance between the behavior feature vector and the document feature vector.
- the data involved in the embodiments of this application can be files, user information or account information, etc.
- the following uses the data as the implementation scenario of the file to explain the solution. It should be understood that the data is not described here. Specific limits.
- FIG. 4 is a flowchart of a data recommendation method provided by an embodiment of the present application.
- the data recommendation method provided in the embodiment of the present application includes the following steps:
- the above-mentioned target page may be a data upload page, and the above-mentioned word vector is a semantic embedding vector.
- the data upload page may be of a certain website or of a certain application program.
- the word vectors corresponding to the target pages associated with the application may be the same or different, which is not specifically limited here.
- Different applications can also have the same target page.
- different banking applications have ID card upload pages.
- the word vectors corresponding to the target pages of the application can be the same or different. Yes, no specific limitation is made here.
- the target page can be preset, and if the currently displayed page is the preset page, then it is determined that the currently displayed page is the target page; another optional implementation manner is that the currently displayed page is Optical Character Recognition (OCR), if a specific field is detected, such as "file upload”, "picture upload”, etc., it is determined that the currently displayed page is the target page.
- OCR Optical Character Recognition
- FIG. 5 is one of the application scenario diagrams of the data recommendation method provided by the embodiment of the present application.
- Figure 5 shows a scenario of a target page.
- word vectors there can be one or more word vectors corresponding to the target page. Usually, there are multiple word vectors.
- S102 Input the word vector into a deep learning model, and determine target data matching the word vector.
- the word vector corresponding to the target page can be input into the trained DSSM model to determine the target data matching the word vector.
- the above-mentioned target data is the data recommended by the deep learning model with a high degree of matching with the above-mentioned word vector.
- the target data is displayed on the current page.
- the word vector corresponding to the target page is determined; the word vector is input into the deep learning model, and the target data matching the word vector is determined; and the target data is displayed.
- the user submits business-related data online, the user does not need to locate the data through cumbersome operations, but directly displays the target data when the target page is displayed, reducing the user's operations of locating the data Steps to recommend relevant data to users, thereby improving the convenience of data upload.
- the inputting the word vector into the deep learning model, and determining the target data matching the word vector includes:
- the word vector is input into the deep learning model to obtain the first target feature vector
- the data corresponding to the second target feature vector is determined as the target data.
- the word vector can be input into the user behavior analysis model to obtain the first target feature vector, wherein the first target feature vector is a behavior feature vector representing user behavior.
- a database is preset, and the database stores the first feature vector and the second feature vector. After the first target feature vector is obtained, the nearest neighbor search algorithm or In another method, at least one second target feature vector corresponding to the first target feature vector is determined.
- the above-mentioned second target feature vector is a file feature vector, and the second target feature vector is used to characterize the file.
- the second target feature vector is input into a preset mapping table to obtain corresponding target data, wherein the mapping table stores the relationship between the second target feature vector and the data Further, the data obtained through querying the mapping table is determined as the target data.
- the determining at least one second target feature vector corresponding to the first target feature vector in the database includes:
- the storage location of the first target feature vector in the database is determined.
- the database stores a first feature vector and a second feature vector, the first target feature vector is the first feature vector, and the second target feature vector is the second feature vector, wherein the first feature vector is a behavior representing user behavior A feature vector, the second feature vector is a file feature vector characterizing the file.
- the nearest neighbor search algorithm is used to obtain the Euclidean distance between the first target feature vector and the second feature vector, and the above-mentioned Euclidean distance is used to characterize the distance between vectors.
- other methods may also be used to calculate the Euclidean distance between vectors, which is not specifically limited here.
- the second feature vector whose Euclidean distance is less than or equal to the preset threshold is determined as the second target feature vector, where there may be multiple second target feature vectors.
- the second target feature vector associated with the first target feature vector is determined.
- the displaying the target data includes:
- the target data corresponding to the second target feature vector is displayed in ascending order of the Euclidean distance.
- one second target feature vector corresponds to one target data.
- the target data corresponding to the second target feature vectors are displayed in ascending order of the Euclidean distances between each second target feature vector and the first target feature vector. That is to say, the Euclidean distance between the second target eigenvector corresponding to the first displayed target data and the first target eigenvector is the smallest, and the last displayed target data corresponds to the second target eigenvector and the first target eigenvector The Euclidean distance between them is the largest.
- the target data is sorted and displayed, and the target data corresponding to the second target feature vector that is strongly related to the first target feature vector is sorted first, so as to preferentially display the target data that is relevant to the target page, and the user There is no need to perform relevant operations to query the target data, thereby improving the convenience of data upload.
- the displaying the target data includes:
- a reminder mark is displayed in a preset area of the target data.
- the data type and keywords corresponding to the target data are acquired.
- An optional implementation manner is that, when the data type is a preset type, it is indicated that the content of the target data involves private information, the target data is determined as private data, and the security level of the target data is relatively high.
- Another optional implementation manner is to determine the target data as private data when the keyword includes a preset field.
- the above-mentioned keyword may be a keyword in the data name, or a keyword obtained after performing OCR processing on the data file.
- Another optional implementation manner is to determine the target data as private data when the data type is a preset type and the keyword includes a preset field.
- a reminder mark is displayed in a preset area of the target data, and the reminder mark includes but not limited to text, image or graph.
- the target data related to private information is determined, and a reminder mark is displayed in the preset area of the private data to remind the user, so as to improve the security of the subsequent data upload process.
- FIG. 6 is the second application scenario diagram of the data recommendation method provided by the embodiment of the present application.
- a dotted frame is displayed in the preset area of the target data, and the dotted frame is a reminder mark, reminding the user that the file is private data.
- the target data is a picture file
- the picture is displayed directly; if the target data is a file, a preview image of the file is displayed.
- the method includes:
- the application program corresponding to the target page is the target application program, display reminder information.
- the above-mentioned input may be a user's touch input or sliding input or other types of input on the target data.
- the application program corresponding to the target page and the security level corresponding to the target are detected.
- the application corresponding to the target page is the target application and the target data is private data, a reminder message is displayed.
- the application associated with the target page is a target application can be determined according to the category of the application.
- the above target application is a non-government and enterprise application, such as a communication application or a film and television application.
- the above reminder information may be text information or voice information.
- the above reminder information may be a pop-up window displayed on the target page, and the content of the pop-up window is a text message of "currently a private file, continuing to select may lead to privacy disclosure".
- the application program corresponding to the target page is not the target application program, or the target data is not private data, no reminder message will be displayed after receiving the user's input of the target data.
- the application program corresponding to the target page is the target application program and the target data is private data
- a reminder message is displayed, so as to prevent the user from revealing the private data, thereby improving the security of data uploading.
- the determining the word vector corresponding to the target page includes:
- Word embedding processing is performed on the keyword information to obtain the word vector.
- OCR processing is performed on the target page to identify text information corresponding to the target page, and the text information includes text displayed on the target page.
- Keywords of the text information are extracted to obtain keyword information.
- TF-IDF Term Frequency–Inverse Document Frequency
- LDA document topic generation model
- word embedding processing is performed on the keyword information to obtain a word vector corresponding to the keyword information.
- FIG. 7 is a schematic flow chart of the data recommendation method provided by the embodiment of the present application.
- the semantic embedding vector corresponding to the file upload page determines the semantic embedding vector corresponding to the file upload page, wherein the above-mentioned file upload page is the target page, and the above-mentioned semantic embedding vector is the word vector.
- the behavior feature vector is queried in a preset database to obtain a document feature vector, wherein the behavior feature vector is a first target feature vector, and the document feature vector is a second target feature vector.
- the file to be uploaded corresponding to the file feature vector is determined using a preset mapping table, and the file to be uploaded is data. Further, the files to be uploaded are displayed.
- the application program corresponding to the file upload page is the target application program, and whether the file to be uploaded is a private file. If the application corresponding to the file upload page is the target application and the file to be uploaded is a private file, a reminder message is displayed. If the application program corresponding to the file upload page is not the target application program, and/or the file to be uploaded is not a private file, then the file to be uploaded is uploaded to the file upload page.
- the data recommendation device 200 includes:
- the first determining module 201 is configured to determine a word vector corresponding to the target page when the target page is displayed;
- the second determination module 202 is used to input the word vector into the deep learning model, and determine the target data matched with the word vector;
- the first display module 203 is configured to display the target data.
- the second determining module 202 is specifically configured to:
- the word vector is input into the deep learning model to obtain the first target feature vector
- the data corresponding to the second target feature vector is determined as the target data.
- the second determining module 202 is also specifically configured to:
- the first display module 203 is specifically configured to:
- the target data corresponding to the second target feature vector is displayed in ascending order of the Euclidean distance.
- the first display module 204 is further specifically configured to:
- a reminder mark is displayed in a preset area of the target data.
- the data recommendation device 200 further includes:
- a receiving module configured to receive user input on the target data
- the second display module is configured to display reminder information when the application program corresponding to the target page is the target application program.
- the first determining module 202 is specifically configured to:
- Word embedding processing is performed on the keyword information to obtain the word vector.
- the word vector corresponding to the target page is determined; the word vector is input into the deep learning model, and the target data matching the word vector is determined; and the target data is displayed.
- the user submits business-related data online, the user does not need to locate the data through cumbersome operations, but directly displays the target data when the target page is displayed, reducing the user's operations of locating the data Steps to recommend relevant data to users, thereby improving the convenience of data upload.
- the data recommending device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal.
- the device may be a mobile electronic device or a non-mobile electronic device.
- the mobile electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a wearable device, an ultra-mobile personal computer (Ultra-Mobile Personal Computer, UMPC), a netbook or a personal digital assistant (Personal Digital Assistant).
- non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (Personal Computer, PC), television (TeleVision, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
- Network Attached Storage NAS
- PC Personal Computer
- TV TeleVision, TV
- teller machine or self-service machine etc.
- the data recommendation device in the embodiment of the present application may be a device with an operating system.
- the operating system may be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
- the data recommendation device provided in the embodiment of the present application can realize each process realized in the method embodiment in FIG. 4 , and details are not repeated here to avoid repetition.
- the embodiment of the present application further provides an electronic device 300, including a processor 301, a memory 302, and programs or instructions stored in the memory 302 and operable on the processor 301,
- an electronic device 300 including a processor 301, a memory 302, and programs or instructions stored in the memory 302 and operable on the processor 301,
- the program or instruction is executed by the processor 301, each process of the above-mentioned data recommendation method embodiment can be realized, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
- the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
- FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
- the electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010, etc. part.
- the electronic device 1000 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 1010 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions.
- a power supply such as a battery
- the structure of the electronic device shown in FIG. 10 does not constitute a limitation to the electronic device.
- the electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .
- the processor 1010 is further configured to determine a word vector corresponding to the target page when the target page is displayed;
- the word vector is input into the deep learning model, and the target data matched with the word vector is determined;
- the display unit 1006 is further configured to display the target data.
- the processor 1010 is further configured to input the word vector into the deep learning model to obtain the first target feature vector;
- the processor 1010 is also configured to determine the storage location of the first target feature vector in the database
- the display unit 1006 is further configured to display the target data corresponding to the second target feature vector in ascending order of the Euclidean distance.
- the processor 1010 is further configured to obtain the data type and keywords corresponding to the target data;
- the display unit 1006 is further configured to display a reminder mark in a preset area of the target data when the data type is a preset type and/or the keyword includes a preset field.
- the user input unit 1007 is also used to receive the input of the target data from the user;
- the display unit 1006 is further configured to display reminder information when the application program corresponding to the target page is the target application program.
- processor 1010 is further configured to perform optical character recognition processing on the target page to obtain text information
- Word embedding processing is performed on the keyword information to obtain the word vector.
- the word vector corresponding to the target page is determined; the word vector is input into the deep learning model, and the target data matching the word vector is determined; and the target data is displayed.
- the user submits business-related data online, the user does not need to locate the data through cumbersome operations, but directly displays the target data when the target page is displayed, reducing the user's operations of locating the data Steps to recommend relevant data to users, thereby improving the convenience of data upload.
- the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042, and the graphics processor 10041 is used for the image capture device (such as the image data of the still picture or video obtained by the camera) for processing.
- the display unit 1006 may include a display panel 10061, and the display panel 10071 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
- the user input unit 1007 includes a touch panel 10071 and other input devices 10072 .
- the touch panel 10071 is also called a touch screen.
- the touch panel 10071 may include two parts, a touch detection device and a touch controller.
- Other input devices 10072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, and joysticks, which will not be repeated here.
- the memory 1009 can be used to store software programs as well as various data, including but not limited to application programs and operating systems.
- Processor 1010 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, user interface, application program, etc., and the modem processor mainly processes wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1010 .
- the embodiment of the present application also provides a readable storage medium, the readable storage medium stores a program or an instruction, and when the program or instruction is executed by a processor, each process of the above-mentioned data recommendation method embodiment is realized, and can achieve the same To avoid repetition, the technical effects will not be repeated here.
- the processor is the processor in the electronic device described in the above embodiments.
- the readable storage medium includes computer readable storage medium, such as computer read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
- the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the above data recommendation method embodiment
- the chip includes a processor and a communication interface
- the communication interface is coupled to the processor
- the processor is used to run programs or instructions to implement the above data recommendation method embodiment
- chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
- the term “comprising”, “comprising” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase “comprising a " does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
- the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioethics (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Medical Informatics (AREA)
- Probability & Statistics with Applications (AREA)
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
The present application provides a data recommendation method and apparatus, an electronic device, and a storage medium. The method comprises: when a target page is displayed, determining a word vector corresponding to the target page; inputting the word vector into a deep learning model, and determining target data matching the word vector; and displaying the target data.
Description
相关申请的交叉引用Cross References to Related Applications
本申请主张在2021年10月11日在中国提交的中国专利申请No.202111182460.7的优先权,其全部内容通过引用包含于此。This application claims priority to Chinese Patent Application No. 202111182460.7 filed in China on October 11, 2021, the entire contents of which are hereby incorporated by reference.
本申请属于互联网技术领域,具体涉及一种数据推荐方法、装置、电子设备及存储介质。The present application belongs to the technical field of the Internet, and in particular relates to a data recommendation method, device, electronic equipment and storage medium.
随着移动互联网的快速发展,越来越多的用户选择使用电子设备办理线上业务。然而,当用户在线上提交业务相关的数据时,例如,用户在线上提交业务相关的文件时,需要人工对电子设备存储的所有数据进行查看,定位到相关数据后,再上传数据。With the rapid development of the mobile Internet, more and more users choose to use electronic devices to handle online business. However, when the user submits business-related data online, for example, when the user submits business-related documents online, it is necessary to manually check all the data stored in the electronic device, and upload the data after locating the relevant data.
在上述过程中,用户需要通过较为繁琐的操作查询业务相关的数据,这降低了数据上传的便捷性。In the above process, users need to query business-related data through cumbersome operations, which reduces the convenience of data uploading.
发明内容Contents of the invention
本申请实施例的目的是一种数据推荐方法、装置、电子设备及存储介质,能够解决用户需要通过较为繁琐的操作对数据进行定位,进而降低数据上传的便捷性的问题。The purpose of the embodiment of the present application is a data recommendation method, device, electronic device and storage medium, which can solve the problem that users need to locate data through cumbersome operations, thereby reducing the convenience of data uploading.
第一方面,本申请实施例提供了一种数据推荐方法,该方法包括:In the first aspect, the embodiment of the present application provides a data recommendation method, which includes:
在显示目标页面的情况下,确定所述目标页面对应的词向量;In the case of displaying the target page, determine the word vector corresponding to the target page;
将所述词向量输入至深度学习模型中,确定与所述词向量匹配的目标数据;The word vector is input into the deep learning model, and the target data matched with the word vector is determined;
显示所述目标数据。Display the target data.
第二方面,本申请实施例提供了一种数据推荐装置,该装置包括:In the second aspect, the embodiment of the present application provides a data recommendation device, which includes:
第一确定模块,用于在显示目标页面的情况下,确定所述目标页面对应 的词向量;The first determining module is used to determine the word vector corresponding to the target page when the target page is displayed;
第二确定模块,用于将所述词向量输入深度学习模型中,确定与所述词向量匹配的目标数据;The second determination module is used to input the word vector into the deep learning model, and determine the target data matched with the word vector;
第一显示模块,用于显示所述目标数据。The first display module is used to display the target data.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored in the memory and operable on the processor, and the program or instruction is The processor implements the steps of the method described in the first aspect when executed.
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented .
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。In the fifth aspect, the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions, so as to implement the first aspect the method described.
本申请实施例中,在显示目标页面的情况下,确定目标页面对应的词向量;将词向量输入至深度学习模型中,确定与词向量匹配的目标数据;显示目标数据。这样,当用户在线上提交业务相关的数据时,不需要用户通过较为繁琐的操作对数据进行定位,而是在显示目标页面的情况下,直接显示目标数据,减少了用户对数据进行定位的操作步骤,向用户推荐相关数据,以此提高了数据上传的便捷性。In the embodiment of the present application, when the target page is displayed, the word vector corresponding to the target page is determined; the word vector is input into the deep learning model, and the target data matching the word vector is determined; and the target data is displayed. In this way, when the user submits business-related data online, the user does not need to locate the data through cumbersome operations, but directly displays the target data when the target page is displayed, reducing the user's operations of locating the data Steps to recommend relevant data to users, thereby improving the convenience of data upload.
图1是本申请实施例提供的深度学习模型的示意图;Fig. 1 is a schematic diagram of the deep learning model provided by the embodiment of the present application;
图2是本申请实施例提供的文件分析模型的示意图;FIG. 2 is a schematic diagram of a file analysis model provided by an embodiment of the present application;
图3是本申请实施例提供的用户行为模型的示意图;FIG. 3 is a schematic diagram of a user behavior model provided by an embodiment of the present application;
图4是本申请实施例提供的数据推荐方法的流程图;FIG. 4 is a flowchart of a data recommendation method provided in an embodiment of the present application;
图5是本申请实施例提供的数据推荐方法的应用场景图之一;Fig. 5 is one of the application scene diagrams of the data recommendation method provided by the embodiment of the present application;
图6是本申请实施例提供的数据推荐方法的应用场景图之二;FIG. 6 is the second application scenario diagram of the data recommendation method provided by the embodiment of the present application;
图7是本申请实施例提供的数据推荐方法的流程示意图;FIG. 7 is a schematic flowchart of a data recommendation method provided in an embodiment of the present application;
图8是本申请实施例提供的数据推荐装置的结构图;FIG. 8 is a structural diagram of a data recommendation device provided by an embodiment of the present application;
图9是本申请实施例提供的电子设备的结构图;FIG. 9 is a structural diagram of an electronic device provided by an embodiment of the present application;
图10是本申请实施例提供的电子设备的硬件结构图。FIG. 10 is a hardware structural diagram of an electronic device provided by an embodiment of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。The following will clearly describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in this application belong to the protection scope of this application.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application can be practiced in sequences other than those illustrated or described herein, and that references to "first," "second," etc. distinguish Objects are generally of one type, and the number of objects is not limited. For example, there may be one or more first objects. In addition, "and/or" in the specification and claims means at least one of the connected objects, and the character "/" generally means that the related objects are an "or" relationship.
为更好理解本申请提供的方案,首先对以下内容进行描述:In order to better understand the solution provided by this application, first describe the following:
请查阅图1,图1是本申请实施例提供的深度学习模型的示意图,如图1所示,上述深度学习模型可以为深度语义匹配模型(Deep Structured Semantic Models,DSSM)模型,一般而言,DSSM模型由2个网络模型在一个联邦学习环境得到,其中,联邦学习是一种机器学习技术。DSSM模型又称双塔模式,其中,一个网络模型可以是文件分析模型,另一个网络模型可以是用户行为模型。Please refer to Fig. 1, Fig. 1 is the schematic diagram of the deep learning model that the embodiment of the present application provides, as shown in Fig. 1, above-mentioned deep learning model can be deep semantic matching model (Deep Structured Semantic Models, DSSM) model, generally speaking, The DSSM model is obtained from two network models in a federated learning environment, where federated learning is a machine learning technique. The DSSM model is also called the twin-tower model, wherein one network model can be a file analysis model, and the other network model can be a user behavior model.
请参阅图2和图3,如图2所示,DSSM模型中的一个网络模型是文件分析模型。文件分析模型的训练过程为:获取训练文件的文件类型和关键词,将上述文件类型和关键词转换成词向量,上述词向量又称为语义嵌入向量,其中,可以使用词嵌入技术得到语义嵌入向量。将语义嵌入向量作为文件分析模型的训练样本,使得文件分析模型输出文件特征向量,应理解,该文件特征向量与训练文件相关联。Please refer to Figure 2 and Figure 3, as shown in Figure 2, one of the network models in the DSSM model is the file analysis model. The training process of the file analysis model is: obtain the file type and keywords of the training file, convert the above file type and keywords into word vectors, and the above word vectors are also called semantic embedding vectors, where the semantic embedding can be obtained by using word embedding technology vector. The semantic embedding vector is used as a training sample of the document analysis model, so that the document analysis model outputs a document feature vector. It should be understood that the document feature vector is associated with the training document.
其中,在获取训练文件的文件类型和关键词的过程中,可以基于文件类 型和关键词确定训练文件的安全等级,确定安全等级较高的隐私文件。Wherein, in the process of obtaining the file type and keywords of the training file, the security level of the training file can be determined based on the file type and keyword, and the privacy file with a higher security level can be determined.
其中,在文件分析模型的训练过程中,将训练文件与对应的文件特征向量存储至预设的映射表,也就是说,该映射表存储有文件特征向量与训练文件之间的映射关系。Wherein, during the training process of the file analysis model, the training file and the corresponding file feature vector are stored in a preset mapping table, that is, the mapping table stores the mapping relationship between the file feature vector and the training file.
如图3所示,DSSM模型中的一个网络模型是用户行为模型。用户行为模型的训练过程为:将可以表征用户行为的相关信息作为训练信息,例如显示页面对应的语义嵌入向量、用户的操作数据和用户的上传文件等;将这些训练信息作为用户行为模型的训练样本,使得用户行为模型输出行为特征向量,应理解,该行为特征向量用于表征用户的操作行为,该操作行为包括用户在特定场景中对于文件选择的偏好。As shown in Figure 3, one of the network models in the DSSM model is the user behavior model. The training process of the user behavior model is: use relevant information that can represent user behavior as training information, such as the semantic embedding vector corresponding to the displayed page, user operation data, and user uploaded files, etc.; use these training information as the training of the user behavior model samples, so that the user behavior model outputs a behavior feature vector. It should be understood that the behavior feature vector is used to characterize the user's operation behavior, and the operation behavior includes the user's preference for file selection in a specific scenario.
在DSSM模型的训练过程中,可以将行为特征向量和文件特征向量存储至预设的数据库,并基于行为特征向量和文件特征向量之间的关联性,调整行为特征向量的存储位置和文件特征向量的存储位置。During the training process of the DSSM model, the behavior feature vector and the file feature vector can be stored in the preset database, and based on the correlation between the behavior feature vector and the file feature vector, the storage location of the behavior feature vector and the file feature vector can be adjusted storage location.
例如,在文件分析模型的一次训练过程中,该文件分析模型的输入是身份文件,输出是该身份文件对应的文件特征向量;在用户行为模型的一次训练过程中,若用户行为模型的输入包括身份文件和用户的操作数据,则可以将用户行为模型输出的行为特征向量与该身份文件对应的文件特征向量存储至数据库,并在数据库中调整该行为特征向量的存储位置与文件特征向量的存储位置,以减小该行为特征向量与该文件特征向量之间的欧式距离。For example, during a training process of the document analysis model, the input of the document analysis model is an identity document, and the output is the document feature vector corresponding to the identity document; during a training process of the user behavior model, if the input of the user behavior model includes Identity files and user operation data, the behavior feature vector output by the user behavior model and the file feature vector corresponding to the identity file can be stored in the database, and the storage location of the behavior feature vector and the storage of the file feature vector can be adjusted in the database position to reduce the Euclidean distance between the behavior feature vector and the document feature vector.
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的数据推荐方法进行详细地说明。The data recommendation method provided by the embodiment of the present application will be described in detail below through specific embodiments and application scenarios with reference to the accompanying drawings.
本申请实施例中涉及的数据可以是文件、用户信息或账户信息等,出于清楚阐述技术方案的目的,以下以数据为文件的实施场景进行方案的阐述,应理解,在此并不对数据进行具体的限定。The data involved in the embodiments of this application can be files, user information or account information, etc. For the purpose of clearly explaining the technical solution, the following uses the data as the implementation scenario of the file to explain the solution. It should be understood that the data is not described here. Specific limits.
请参阅图4,图4是本申请实施例提供的数据推荐方法的流程图。本申请实施例提供的数据推荐方法包括以下步骤:Please refer to FIG. 4 . FIG. 4 is a flowchart of a data recommendation method provided by an embodiment of the present application. The data recommendation method provided in the embodiment of the present application includes the following steps:
S101,在显示目标页面的情况下,确定所述目标页面对应的词向量。S101. In the case of displaying a target page, determine a word vector corresponding to the target page.
上述目标页面可以是数据上传页面,上述词向量即语义嵌入向量。数据上传页面可以是某网站的,也可以是某应用程序的。其中,对于一个应用程 序而言,该应用程序关联的目标页面对应的词向量可以是相同的,也可以是不同的,这里不做具体限定。不同应用程序也可以有相同的目标页面,例如,不同银行应用程序都有身份证上传页面,对于不同应用程序而言,该应用程序的目标页面对应的词向量可以是相同的,也可以是不同的,这里不做具体限定。The above-mentioned target page may be a data upload page, and the above-mentioned word vector is a semantic embedding vector. The data upload page may be of a certain website or of a certain application program. Wherein, for an application, the word vectors corresponding to the target pages associated with the application may be the same or different, which is not specifically limited here. Different applications can also have the same target page. For example, different banking applications have ID card upload pages. For different applications, the word vectors corresponding to the target pages of the application can be the same or different. Yes, no specific limitation is made here.
一种可选地实施方式为,目标页面可以是预先设置的,若当前显示页面为预设页面,则确定当前显示页面为目标页面;另一种可选地实施方式为,对当前显示页面进行光学字符识别(Optical Character Recognition,OCR),若检测到特定字段,例如“文件上传”、“图片上传”等,则确定当前显示页面为目标页面。An optional implementation manner is that the target page can be preset, and if the currently displayed page is the preset page, then it is determined that the currently displayed page is the target page; another optional implementation manner is that the currently displayed page is Optical Character Recognition (OCR), if a specific field is detected, such as "file upload", "picture upload", etc., it is determined that the currently displayed page is the target page.
为便于理解,请参阅图5,图5是本申请实施例提供的数据推荐方法的应用场景图之一。图5示出的就是一种目标页面的场景。For ease of understanding, please refer to FIG. 5 , which is one of the application scenario diagrams of the data recommendation method provided by the embodiment of the present application. Figure 5 shows a scenario of a target page.
应理解,关于如何确定该目标页面对应的词向量的具体技术方案,请参阅后续实施例。It should be understood that for specific technical solutions on how to determine the word vector corresponding to the target page, please refer to subsequent embodiments.
目标页面对应的词向量可以有一个或多个,通常情况下,词向量的个数为多个。There can be one or more word vectors corresponding to the target page. Usually, there are multiple word vectors.
S102,将所述词向量输入至深度学习模型中,确定与所述词向量匹配的目标数据。S102. Input the word vector into a deep learning model, and determine target data matching the word vector.
如上所述,可以将目标页面对应的词向量输入至训练完成的DSSM模型中,确定与词向量匹配的目标数据。其中,上述目标数据也就是深度学习模型推荐的与上述词向量匹配度较高的数据。具体的技术方案,请参阅后续实施例。As mentioned above, the word vector corresponding to the target page can be input into the trained DSSM model to determine the target data matching the word vector. Wherein, the above-mentioned target data is the data recommended by the deep learning model with a high degree of matching with the above-mentioned word vector. For specific technical solutions, please refer to the subsequent embodiments.
S103,显示所述目标数据。S103. Display the target data.
本步骤中,在确定目标数据后,在当前页面显示目标数据。In this step, after the target data is determined, the target data is displayed on the current page.
本实施例中,在显示目标页面的情况下,确定目标页面对应的词向量;将词向量输入至深度学习模型中,确定与词向量匹配的目标数据;显示目标数据。这样,当用户在线上提交业务相关的数据时,不需要用户通过较为繁琐的操作对数据进行定位,而是在显示目标页面的情况下,直接显示目标数据,减少了用户对数据进行定位的操作步骤,向用户推荐相关数据,以此提 高了数据上传的便捷性。In this embodiment, when the target page is displayed, the word vector corresponding to the target page is determined; the word vector is input into the deep learning model, and the target data matching the word vector is determined; and the target data is displayed. In this way, when the user submits business-related data online, the user does not need to locate the data through cumbersome operations, but directly displays the target data when the target page is displayed, reducing the user's operations of locating the data Steps to recommend relevant data to users, thereby improving the convenience of data upload.
可选地,所述将所述词向量输入至深度学习模型中,确定与所述词向量匹配的目标数据包括:Optionally, the inputting the word vector into the deep learning model, and determining the target data matching the word vector includes:
将所述词向量输入至深度学习模型中,得到第一目标特征向量;The word vector is input into the deep learning model to obtain the first target feature vector;
确定所述第一目标特征向量在数据库中对应的至少一个第二目标特征向量;determining at least one second target feature vector corresponding to the first target feature vector in the database;
将所述第二目标特征向量对应的数据确定为所述目标数据。The data corresponding to the second target feature vector is determined as the target data.
本实施例中,可以将词向量输入至用户行为分析模型中,得到第一目标特征向量,其中,该第一目标特征向量为表征用户行为的行为特征向量。In this embodiment, the word vector can be input into the user behavior analysis model to obtain the first target feature vector, wherein the first target feature vector is a behavior feature vector representing user behavior.
本实施例中,预先设置有数据库,该数据库存储有第一特征向量和第二特征向量,在得到第一目标特征向量后,可以在数据库中对该第一目标特征向量使用最邻近搜索算法或者其他方法,确定该第一目标特征向量对应的至少一个第二目标特征向量。其中,上述第二目标特征向量为文件特征向量,第二目标特征向量用于表征文件。In this embodiment, a database is preset, and the database stores the first feature vector and the second feature vector. After the first target feature vector is obtained, the nearest neighbor search algorithm or In another method, at least one second target feature vector corresponding to the first target feature vector is determined. Wherein, the above-mentioned second target feature vector is a file feature vector, and the second target feature vector is used to characterize the file.
如上所述,在文件分析模型的训练过程中,需要将训练文件与对应的文件特征向量存储至预设的映射表。As mentioned above, during the training process of the file analysis model, it is necessary to store the training files and the corresponding file feature vectors in a preset mapping table.
本实施例中,在得到第二目标特征向量之后,将第二目标特征向量输入至预设的映射表中,得到对应的目标数据,其中,映射表存储有第二目标特征向量与数据之间的映射关系,进一步的,将通过映射表查询得到的数据确定为目标数据。In this embodiment, after the second target feature vector is obtained, the second target feature vector is input into a preset mapping table to obtain corresponding target data, wherein the mapping table stores the relationship between the second target feature vector and the data Further, the data obtained through querying the mapping table is determined as the target data.
可选地,所述确定所述第一目标特征向量在数据库中对应的至少一个第二目标特征向量包括:Optionally, the determining at least one second target feature vector corresponding to the first target feature vector in the database includes:
确定所述第一目标特征向量在所述数据库中的存储位置;determining a storage location of the first target feature vector in the database;
基于所述存储位置,使用最邻近搜索算法,得到所述第一目标特征向量与第二特征向量之间的欧式距离;Based on the storage location, using a nearest neighbor search algorithm to obtain the Euclidean distance between the first target feature vector and the second feature vector;
将所述欧式距离小于或等于预设阈值的第二特征向量确定为所述第二目标特征向量。Determining a second eigenvector whose Euclidean distance is less than or equal to a preset threshold as the second target eigenvector.
本实施例中,在得到第一目标特征向量后,确定第一目标特征向量在数据库中的存储位置。应理解,数据库存储有第一特征向量和第二特征向量, 第一目标特征向量为第一特征向量,第二目标特征向量为第二特征向量,其中,第一特征向量为表征用户行为的行为特征向量,第二特征向量为表征文件的文件特征向量。In this embodiment, after the first target feature vector is obtained, the storage location of the first target feature vector in the database is determined. It should be understood that the database stores a first feature vector and a second feature vector, the first target feature vector is the first feature vector, and the second target feature vector is the second feature vector, wherein the first feature vector is a behavior representing user behavior A feature vector, the second feature vector is a file feature vector characterizing the file.
在该存储位置的基础上,使用最邻近搜索算法,得到第一目标特征向量与第二特征向量之间的欧式距离,上述欧式距离用于表征向量之间的距离。在其他实施例中,也可以使用其他方法计算向量之间的欧式距离,在此不作具体限定。On the basis of the storage location, the nearest neighbor search algorithm is used to obtain the Euclidean distance between the first target feature vector and the second feature vector, and the above-mentioned Euclidean distance is used to characterize the distance between vectors. In other embodiments, other methods may also be used to calculate the Euclidean distance between vectors, which is not specifically limited here.
进一步的,将欧式距离小于或等于预设阈值的第二特征向量确定为第二目标特征向量,其中,第二目标特征向量可以为多个。Further, the second feature vector whose Euclidean distance is less than or equal to the preset threshold is determined as the second target feature vector, where there may be multiple second target feature vectors.
本实施例中,基于第一目标特征向量与第二特征向量之间的欧式距离,确定与第一目标特征向量相关联的第二目标特征向量。In this embodiment, based on the Euclidean distance between the first target feature vector and the second feature vector, the second target feature vector associated with the first target feature vector is determined.
可选地,所述显示所述目标数据包括:Optionally, the displaying the target data includes:
按照欧式距离从小到大的顺序,显示所述第二目标特征向量对应的目标数据。The target data corresponding to the second target feature vector is displayed in ascending order of the Euclidean distance.
应理解,一个第二目标特征向量对应一个目标数据。本实施例中,在得到第二目标特征向量后,按照各个第二目标特征向量与第一目标特征向量之间的欧式距离从小到大的顺序,显示第二目标特征向量对应的目标数据。也就是说,显示的第一个目标数据对应的第二目标特征向量与第一目标特征向量之间的欧式距离最小,显示的最后一个目标数据对应的第二目标特征向量与第一目标特征向量之间的欧式距离最大。It should be understood that one second target feature vector corresponds to one target data. In this embodiment, after the second target feature vectors are obtained, the target data corresponding to the second target feature vectors are displayed in ascending order of the Euclidean distances between each second target feature vector and the first target feature vector. That is to say, the Euclidean distance between the second target eigenvector corresponding to the first displayed target data and the first target eigenvector is the smallest, and the last displayed target data corresponds to the second target eigenvector and the first target eigenvector The Euclidean distance between them is the largest.
应理解,在DSSM模型的训练过程中,若行为特征向量与文件特征向量之间的相关性越强,则这两个特征向量之间的欧式距离越短。It should be understood that during the training process of the DSSM model, if the correlation between the behavior feature vector and the document feature vector is stronger, the Euclidean distance between the two feature vectors is shorter.
本实施例中,对目标数据进行排序显示,将与第一目标特征向量强相关的第二目标特征向量对应的目标数据排序在前,以此优先显示与目标页面具有关联性的目标数据,用户不需要执行相关操作来查询目标数据,进而提高了数据上传的便捷性。In this embodiment, the target data is sorted and displayed, and the target data corresponding to the second target feature vector that is strongly related to the first target feature vector is sorted first, so as to preferentially display the target data that is relevant to the target page, and the user There is no need to perform relevant operations to query the target data, thereby improving the convenience of data upload.
可选地,所述显示所述目标数据包括:Optionally, the displaying the target data includes:
获取所述目标数据对应的数据类型和关键词;Obtain the data type and keywords corresponding to the target data;
在所述数据类型为预设类型和/或所述关键词包括预设字段的情况下,在 所述目标数据的预设区域显示提醒标识。When the data type is a preset type and/or the keyword includes a preset field, a reminder mark is displayed in a preset area of the target data.
本实施例中,在确定第二目标特征向量对应的目标数据之后,获取目标数据对应的数据类型和关键词。In this embodiment, after the target data corresponding to the second target feature vector is determined, the data type and keywords corresponding to the target data are acquired.
一种可选地实施方式为,在数据类型为预设类型的情况下,表明目标数据的内容涉及隐私信息,将目标数据确定为隐私数据,目标数据的安全等级较高。An optional implementation manner is that, when the data type is a preset type, it is indicated that the content of the target data involves private information, the target data is determined as private data, and the security level of the target data is relatively high.
另一种可选地实施方式为,在关键词包括预设字段的情况下,将目标数据确定为隐私数据。其中,上述关键词可以是数据名称中的关键词,或者,对数据文件进行OCR处理后得到的关键词。Another optional implementation manner is to determine the target data as private data when the keyword includes a preset field. Wherein, the above-mentioned keyword may be a keyword in the data name, or a keyword obtained after performing OCR processing on the data file.
另一种可选地实施方式为,在数据类型为预设类型且关键词包括预设字段的情况下,将目标数据确定为隐私数据。Another optional implementation manner is to determine the target data as private data when the data type is a preset type and the keyword includes a preset field.
进一步的,在目标数据的预设区域显示提醒标识,该提醒标识包括但不限于文字、图像或图形。Further, a reminder mark is displayed in a preset area of the target data, and the reminder mark includes but not limited to text, image or graph.
本实施例中,根据目标数据的数据类型和关键词,确定涉及隐私信息的目标数据,并在隐私数据的预设区域显示提醒标识提醒用户,提高后续的数据上传过程的安全性。In this embodiment, according to the data type and keywords of the target data, the target data related to private information is determined, and a reminder mark is displayed in the preset area of the private data to remind the user, so as to improve the security of the subsequent data upload process.
为便于理解,请参阅图6,图6是本申请实施例提供的数据推荐方法的应用场景图之二。如图6所示,目标数据的预设区域显示有虚线框,该虚线框即为提醒标识,提醒用户该文件为隐私数据。在图6所示的场景中,若目标数据为图片文件,则直接显示该图片;若目标数据为文件,则显示文件预览图。For ease of understanding, please refer to FIG. 6 . FIG. 6 is the second application scenario diagram of the data recommendation method provided by the embodiment of the present application. As shown in FIG. 6 , a dotted frame is displayed in the preset area of the target data, and the dotted frame is a reminder mark, reminding the user that the file is private data. In the scene shown in FIG. 6 , if the target data is a picture file, the picture is displayed directly; if the target data is a file, a preview image of the file is displayed.
可选地,所述显示所述目标数据之后,所述方法包括:Optionally, after displaying the target data, the method includes:
接收用户对所述目标数据的输入;receiving user input on the target data;
在所述目标页面对应的应用程序为目标应用程序的情况下,显示提醒信息。If the application program corresponding to the target page is the target application program, display reminder information.
上述输入可以是用户对目标数据的触控输入或滑动输入或其他类型的输入。The above-mentioned input may be a user's touch input or sliding input or other types of input on the target data.
本实施例中,在接收到用户对目标数据的输入后,检测目标页面对应的应用程序和目标对应的安全等级。在目标页面对应的应用程序为目标应用程 序,且目标数据为隐私数据的情况下,显示提醒信息。In this embodiment, after receiving the user's input of the target data, the application program corresponding to the target page and the security level corresponding to the target are detected. When the application corresponding to the target page is the target application and the target data is private data, a reminder message is displayed.
应理解,可以通过应用程序的类别判断目标页面关联的应用程序是否为目标应用程序,可选地,上述目标应用程序为非政企类应用程序,例如通讯类应用程序或影视类应用程序。It should be understood that whether the application associated with the target page is a target application can be determined according to the category of the application. Optionally, the above target application is a non-government and enterprise application, such as a communication application or a film and television application.
应理解,上述提醒信息可以是文字信息或语音信息。示例性的,上述提醒信息可以是在目标页面显示的弹窗,弹窗内容为“当前为隐私文件,继续选择可能导致隐私泄露”这一文字信息。It should be understood that the above reminder information may be text information or voice information. Exemplarily, the above reminder information may be a pop-up window displayed on the target page, and the content of the pop-up window is a text message of "currently a private file, continuing to select may lead to privacy disclosure".
在其他实施例中,若目标页面对应的应用程序不是目标应用程序,或目标数据不是隐私数据,则在接收到用户对目标数据的输入后,不会显示提醒信息。In other embodiments, if the application program corresponding to the target page is not the target application program, or the target data is not private data, no reminder message will be displayed after receiving the user's input of the target data.
本实施例中,在目标页面对应的应用程序为目标应用程序,且目标数据为隐私数据的情况下,显示提醒信息,以此防止用户泄露隐私数据,进而提高数据上传的安全性。In this embodiment, when the application program corresponding to the target page is the target application program and the target data is private data, a reminder message is displayed, so as to prevent the user from revealing the private data, thereby improving the security of data uploading.
以下,具体阐述确定目标页面对应的词向量的技术方案:The technical solution for determining the word vector corresponding to the target page is described in detail below:
可选地,所述确定所述目标页面对应的词向量包括:Optionally, the determining the word vector corresponding to the target page includes:
对所述目标页面进行光学字符识别处理,得到文本信息;performing optical character recognition processing on the target page to obtain text information;
对所述文本信息进行关键词提取,得到关键词信息;performing keyword extraction on the text information to obtain keyword information;
对所述关键词信息进行词嵌入处理,得到所述词向量。Word embedding processing is performed on the keyword information to obtain the word vector.
本实施例中,在显示目标页面的情况下,对目标页面进行OCR处理,识别得到该目标页面对应的文本信息,该文本信息包括目标页面显示的文字。In this embodiment, when the target page is displayed, OCR processing is performed on the target page to identify text information corresponding to the target page, and the text information includes text displayed on the target page.
提取该文本信息的关键词,得到关键词信息。其中,可以使用词频-逆文本频率指数(Term Frequency–Inverse Document Frequency,TF-IDF)算法提取关键词或者使用文档主题生成模型(Latent Dirichlet Allocation,LDA)算法提取关键词过程其他方式,本实施例在此不作具体限定。Keywords of the text information are extracted to obtain keyword information. Among them, you can use the Term Frequency–Inverse Document Frequency (TF-IDF) algorithm to extract keywords or use the document topic generation model (Latent Dirichlet Allocation, LDA) algorithm to extract keywords. Other methods, this embodiment No specific limitation is made here.
在得到关键词信息后,对关键词信息进行词嵌入处理,得到该关键词信息对应的词向量。After the keyword information is obtained, word embedding processing is performed on the keyword information to obtain a word vector corresponding to the keyword information.
为便于理解整体方案,请参阅图7,图7是本申请实施例提供的数据推荐方法的流程示意图。For easy understanding of the overall solution, please refer to FIG. 7 , which is a schematic flow chart of the data recommendation method provided by the embodiment of the present application.
如图7所示,在显示文件上传页面的情况下,确定文件上传页面对应的 语义嵌入向量,其中,上述文件上传页面即目标页面,上述语义嵌入向量即词向量。将语义嵌入向量输入至训练完成的深度学习模型,得到行为特征向量,其中,上述深度学习模型即深度学习模型。在预设的数据库中对行为特征向量进行查询,得到文件特征向量,其中,上述行为特征向量即第一目标特征向量,上述文件特征向量即第二目标特征向量。使用预设的映射表确定该文件特征向量对应的待上传文件,上述待上传文件即数据。进一步的,显示待上传文件。在接收到对待上传文件的输入的情况下,判断文件上传页面对应的应用程序是否为目标应用程序,且待上传文件是否为隐私文件。若文件上传页面对应的应用程序是目标应用程序,且待上传文件是隐私文件,则显示提醒信息。若文件上传页面对应的应用程序不是目标应用程序,和/或待上传文件不是隐私文件,则将待上传文件上传至文件上传页面。As shown in Figure 7, in the case of displaying the file upload page, determine the semantic embedding vector corresponding to the file upload page, wherein the above-mentioned file upload page is the target page, and the above-mentioned semantic embedding vector is the word vector. Input the semantic embedding vector into the trained deep learning model to obtain the behavioral feature vector, wherein the above deep learning model is the deep learning model. The behavior feature vector is queried in a preset database to obtain a document feature vector, wherein the behavior feature vector is a first target feature vector, and the document feature vector is a second target feature vector. The file to be uploaded corresponding to the file feature vector is determined using a preset mapping table, and the file to be uploaded is data. Further, the files to be uploaded are displayed. When the input of the file to be uploaded is received, it is judged whether the application program corresponding to the file upload page is the target application program, and whether the file to be uploaded is a private file. If the application corresponding to the file upload page is the target application and the file to be uploaded is a private file, a reminder message is displayed. If the application program corresponding to the file upload page is not the target application program, and/or the file to be uploaded is not a private file, then the file to be uploaded is uploaded to the file upload page.
如图8所示,数据推荐装置200包括:As shown in Figure 8, the data recommendation device 200 includes:
第一确定模块201,用于在显示目标页面的情况下,确定所述目标页面对应的词向量;The first determining module 201 is configured to determine a word vector corresponding to the target page when the target page is displayed;
第二确定模块202,用于将所述词向量输入深度学习模型中,确定与所述词向量匹配的目标数据;The second determination module 202 is used to input the word vector into the deep learning model, and determine the target data matched with the word vector;
第一显示模块203,用于显示所述目标数据。The first display module 203 is configured to display the target data.
可选地,所述第二确定模块202,具体用于:Optionally, the second determining module 202 is specifically configured to:
将所述词向量输入至深度学习模型中,得到第一目标特征向量;The word vector is input into the deep learning model to obtain the first target feature vector;
确定所述第一目标特征向量在数据库中对应的至少一个第二目标特征向量;determining at least one second target feature vector corresponding to the first target feature vector in the database;
将所述第二目标特征向量对应的数据确定为所述目标数据。The data corresponding to the second target feature vector is determined as the target data.
可选地,所述第二确定模块202,还具体用于:Optionally, the second determining module 202 is also specifically configured to:
确定所述第一目标特征向量在所述数据库中的存储位置;determining a storage location of the first target feature vector in the database;
基于所述存储位置,使用最邻近搜索算法,得到所述第一目标特征向量与第二特征向量之间的欧式距离;Based on the storage location, using a nearest neighbor search algorithm to obtain the Euclidean distance between the first target feature vector and the second feature vector;
将所述欧式距离小于或等于预设阈值的第二特征向量确定为所述第二目标特征向量。Determining a second eigenvector whose Euclidean distance is less than or equal to a preset threshold as the second target eigenvector.
可选地,所述第一显示模块203,具体用于:Optionally, the first display module 203 is specifically configured to:
按照欧式距离从小到大的顺序,显示所述第二目标特征向量对应的目标数据。The target data corresponding to the second target feature vector is displayed in ascending order of the Euclidean distance.
可选地,所述第一显示模块204,还具体用于:Optionally, the first display module 204 is further specifically configured to:
获取所述目标数据对应的数据类型和关键词;Obtain the data type and keywords corresponding to the target data;
在所述数据类型为预设类型和/或所述关键词包括预设字段的情况下,在所述目标数据的预设区域显示提醒标识。When the data type is a preset type and/or the keyword includes a preset field, a reminder mark is displayed in a preset area of the target data.
可选地,所述数据推荐装置200还包括:Optionally, the data recommendation device 200 further includes:
接收模块,用于接收用户对所述目标数据的输入;a receiving module, configured to receive user input on the target data;
第二显示模块,用于在所述目标页面对应的应用程序为目标应用程序的情况下,显示提醒信息。The second display module is configured to display reminder information when the application program corresponding to the target page is the target application program.
可选地,所述第一确定模块202,具体用于:Optionally, the first determining module 202 is specifically configured to:
对所述目标页面进行光学字符识别处理,得到文本信息;performing optical character recognition processing on the target page to obtain text information;
对所述文本信息进行关键词提取,得到关键词信息;performing keyword extraction on the text information to obtain keyword information;
对所述关键词信息进行词嵌入处理,得到所述词向量。Word embedding processing is performed on the keyword information to obtain the word vector.
本申请实施例中,在显示目标页面的情况下,确定目标页面对应的词向量;将词向量输入至深度学习模型中,确定与词向量匹配的目标数据;显示目标数据。这样,当用户在线上提交业务相关的数据时,不需要用户通过较为繁琐的操作对数据进行定位,而是在显示目标页面的情况下,直接显示目标数据,减少了用户对数据进行定位的操作步骤,向用户推荐相关数据,以此提高了数据上传的便捷性。In the embodiment of the present application, when the target page is displayed, the word vector corresponding to the target page is determined; the word vector is input into the deep learning model, and the target data matching the word vector is determined; and the target data is displayed. In this way, when the user submits business-related data online, the user does not need to locate the data through cumbersome operations, but directly displays the target data when the target page is displayed, reducing the user's operations of locating the data Steps to recommend relevant data to users, thereby improving the convenience of data upload.
本申请实施例中的数据推荐装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(Ultra-Mobile Personal Computer,UMPC)、上网本或者个人数字助理(Personal Digital Assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(Personal Computer,PC)、电视机(TeleVision,TV)、柜员机或者自助机等,本申请实施例不作具体限定。The data recommending device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. Exemplary, the mobile electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a wearable device, an ultra-mobile personal computer (Ultra-Mobile Personal Computer, UMPC), a netbook or a personal digital assistant (Personal Digital Assistant). Assistant, PDA), etc., non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (Personal Computer, PC), television (TeleVision, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
本申请实施例中的数据推荐装置可以为具有操作系统的装置。该操作系 统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。The data recommendation device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
本申请实施例提供的数据推荐装置能够实现图4方法实施例实现的各个过程,为避免重复,这里不再赘述。The data recommendation device provided in the embodiment of the present application can realize each process realized in the method embodiment in FIG. 4 , and details are not repeated here to avoid repetition.
可选地,如图9所示,本申请实施例还提供一种电子设备300,包括处理器301,存储器302,存储在存储器302上并可在所述处理器301上运行的程序或指令,该程序或指令被处理器301执行时实现上述数据推荐方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Optionally, as shown in FIG. 9 , the embodiment of the present application further provides an electronic device 300, including a processor 301, a memory 302, and programs or instructions stored in the memory 302 and operable on the processor 301, When the program or instruction is executed by the processor 301, each process of the above-mentioned data recommendation method embodiment can be realized, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
图10为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
该电子设备1000包括但不限于:射频单元1001、网络模块1002、音频输出单元1003、输入单元1004、传感器1005、显示单元1006、用户输入单元1007、接口单元1008、存储器1009、以及处理器1010等部件。The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010, etc. part.
本领域技术人员可以理解,电子设备1000还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器1010逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图10中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art can understand that the electronic device 1000 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 1010 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions. The structure of the electronic device shown in FIG. 10 does not constitute a limitation to the electronic device. The electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .
其中,处理器1010,还用于在显示目标页面的情况下,确定所述目标页面对应的词向量;Wherein, the processor 1010 is further configured to determine a word vector corresponding to the target page when the target page is displayed;
将所述词向量输入至深度学习模型中,确定与所述词向量匹配的目标数据;The word vector is input into the deep learning model, and the target data matched with the word vector is determined;
显示单元1006,还用于显示所述目标数据。The display unit 1006 is further configured to display the target data.
其中,处理器1010,还用于将所述词向量输入至深度学习模型中,得到第一目标特征向量;Wherein, the processor 1010 is further configured to input the word vector into the deep learning model to obtain the first target feature vector;
确定所述第一目标特征向量在数据库中对应的至少一个第二目标特征向量。Determine at least one second target feature vector corresponding to the first target feature vector in the database.
其中,处理器1010,还用于确定所述第一目标特征向量在所述数据库中 的存储位置;Wherein, the processor 1010 is also configured to determine the storage location of the first target feature vector in the database;
基于所述存储位置,使用最邻近搜索算法,得到所述第一目标特征向量与第二特征向量之间的欧式距离;Based on the storage location, using a nearest neighbor search algorithm to obtain the Euclidean distance between the first target feature vector and the second feature vector;
将所述欧式距离小于或等于预设阈值的第二特征向量确定为所述第二目标特征向量。Determining a second eigenvector whose Euclidean distance is less than or equal to a preset threshold as the second target eigenvector.
其中,显示单元1006,还用于按照欧式距离从小到大的顺序,显示所述第二目标特征向量对应的目标数据。Wherein, the display unit 1006 is further configured to display the target data corresponding to the second target feature vector in ascending order of the Euclidean distance.
其中,处理器1010,还用于获取所述目标数据对应的数据类型和关键词;Wherein, the processor 1010 is further configured to obtain the data type and keywords corresponding to the target data;
显示单元1006,还用于在所述数据类型为预设类型和/或所述关键词包括预设字段的情况下,在所述目标数据的预设区域显示提醒标识。The display unit 1006 is further configured to display a reminder mark in a preset area of the target data when the data type is a preset type and/or the keyword includes a preset field.
其中,用户输入单元1007,还用于接收用户所述目标数据的输入;Wherein, the user input unit 1007 is also used to receive the input of the target data from the user;
显示单元1006,还用于在所述目标页面对应的应用程序为目标应用程序的情况下,显示提醒信息。The display unit 1006 is further configured to display reminder information when the application program corresponding to the target page is the target application program.
其中,处理器1010,还用于对所述目标页面进行光学字符识别处理,得到文本信息;Wherein, the processor 1010 is further configured to perform optical character recognition processing on the target page to obtain text information;
对所述文本信息进行关键词提取,得到关键词信息;performing keyword extraction on the text information to obtain keyword information;
对所述关键词信息进行词嵌入处理,得到所述词向量。Word embedding processing is performed on the keyword information to obtain the word vector.
本申请实施例中,在显示目标页面的情况下,确定目标页面对应的词向量;将词向量输入至深度学习模型中,确定与词向量匹配的目标数据;显示目标数据。这样,当用户在线上提交业务相关的数据时,不需要用户通过较为繁琐的操作对数据进行定位,而是在显示目标页面的情况下,直接显示目标数据,减少了用户对数据进行定位的操作步骤,向用户推荐相关数据,以此提高了数据上传的便捷性。In the embodiment of the present application, when the target page is displayed, the word vector corresponding to the target page is determined; the word vector is input into the deep learning model, and the target data matching the word vector is determined; and the target data is displayed. In this way, when the user submits business-related data online, the user does not need to locate the data through cumbersome operations, but directly displays the target data when the target page is displayed, reducing the user's operations of locating the data Steps to recommend relevant data to users, thereby improving the convenience of data upload.
应理解的是,本申请实施例中,输入单元1004可以包括图形处理器(Graphics Processing Unit,GPU)10041和麦克风10042,图形处理器10041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元1006可包括显示面板10061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板10071。用户输入单元1007包括触控面板10071以及其他输入设备10072。触控面板10071, 也称为触摸屏。触控面板10071可包括触摸检测装置和触摸控制器两个部分。其他输入设备10072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器1009可用于存储软件程序以及各种数据,包括但不限于应用程序和操作系统。处理器1010可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1010中。It should be understood that, in the embodiment of the present application, the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042, and the graphics processor 10041 is used for the image capture device ( Such as the image data of the still picture or video obtained by the camera) for processing. The display unit 1006 may include a display panel 10061, and the display panel 10071 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072 . The touch panel 10071 is also called a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, and joysticks, which will not be repeated here. The memory 1009 can be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, user interface, application program, etc., and the modem processor mainly processes wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1010 .
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述数据推荐方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application also provides a readable storage medium, the readable storage medium stores a program or an instruction, and when the program or instruction is executed by a processor, each process of the above-mentioned data recommendation method embodiment is realized, and can achieve the same To avoid repetition, the technical effects will not be repeated here.
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。Wherein, the processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer readable storage medium, such as computer read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述数据推荐方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the above data recommendation method embodiment Each process can achieve the same technical effect, so in order to avoid repetition, it will not be repeated here.
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省 去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on this understanding, the essence of the technical solution of this application or the part that contributes to related technologies can be embodied in the form of computer software products, which are stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to enable a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Under the inspiration of this application, without departing from the purpose of this application and the scope of protection of the claims, many forms can also be made, all of which belong to the protection of this application.
Claims (19)
- 一种数据推荐方法,包括:A data recommendation method, comprising:在显示目标页面的情况下,确定所述目标页面对应的词向量;In the case of displaying the target page, determine the word vector corresponding to the target page;将所述词向量输入至深度学习模型中,确定与所述词向量匹配的目标数据;The word vector is input into the deep learning model, and the target data matched with the word vector is determined;显示所述目标数据。Display the target data.
- 根据权利要求1所述的方法,其中,所述将所述词向量输入至深度学习模型中,确定与所述词向量匹配的目标数据包括:The method according to claim 1, wherein said inputting said word vector into a deep learning model, determining target data matched with said word vector comprises:将所述词向量输入至深度学习模型中,得到第一目标特征向量,所述第一目标特征向量用于表征用户行为;The word vector is input into the deep learning model to obtain a first target feature vector, and the first target feature vector is used to represent user behavior;确定所述第一目标特征向量在数据库中对应的至少一个第二目标特征向量;所述数据库存储有第一特征向量和第二特征向量,所述第一特征向量和所述第二特征向量基于所述深度学习模型生成;Determining at least one second target feature vector corresponding to the first target feature vector in the database; the database stores a first feature vector and a second feature vector, and the first feature vector and the second feature vector are based on The deep learning model is generated;将所述第二目标特征向量对应的数据确定为所述目标数据。The data corresponding to the second target feature vector is determined as the target data.
- 根据权利要求2所述的方法,其中,所述确定所述第一目标特征向量在数据库中对应的至少一个第二目标特征向量包括:The method according to claim 2, wherein said determining at least one second target feature vector corresponding to the first target feature vector in the database comprises:确定所述第一目标特征向量在所述数据库中的存储位置;determining a storage location of the first target feature vector in the database;基于所述存储位置,使用最邻近搜索算法,得到所述第一目标特征向量与第二特征向量之间的欧式距离;Based on the storage location, using a nearest neighbor search algorithm to obtain the Euclidean distance between the first target feature vector and the second feature vector;将所述欧式距离小于或等于预设阈值的第二特征向量确定为所述第二目标特征向量。Determining a second eigenvector whose Euclidean distance is less than or equal to a preset threshold as the second target eigenvector.
- 根据权利要求3所述的方法,其中,所述显示所述目标数据包括:The method of claim 3, wherein said displaying said target data comprises:按照欧式距离从小到大的顺序,显示所述第二目标特征向量对应的目标数据。The target data corresponding to the second target feature vector is displayed in ascending order of the Euclidean distance.
- 根据权利要求1所述的方法,其中,所述显示所述目标数据包括:The method of claim 1, wherein said displaying said target data comprises:获取所述目标数据对应的数据类型和关键词;Obtain the data type and keywords corresponding to the target data;在所述数据类型为预设类型和/或所述关键词包括预设字段的情况下,在所述目标数据的预设区域显示提醒标识。When the data type is a preset type and/or the keyword includes a preset field, a reminder mark is displayed in a preset area of the target data.
- 根据权利要求1所述的方法,其中,所述显示所述目标数据之后,所述方法还包括:The method according to claim 1, wherein, after displaying the target data, the method further comprises:接收用户对所述目标数据的输入;receiving user input on the target data;在所述目标页面对应的应用程序为目标应用程序的情况下,显示提醒信息。If the application program corresponding to the target page is the target application program, display reminder information.
- 根据权利要求1所述的方法,其中,所述确定所述目标页面对应的词向量包括:The method according to claim 1, wherein said determining the word vector corresponding to the target page comprises:对所述目标页面进行光学字符识别处理,得到文本信息;performing optical character recognition processing on the target page to obtain text information;对所述文本信息进行关键词提取,得到关键词信息;performing keyword extraction on the text information to obtain keyword information;对所述关键词信息进行词嵌入处理,得到所述词向量。Word embedding processing is performed on the keyword information to obtain the word vector.
- 一种数据推荐装置,所述装置包括:A data recommendation device, said device comprising:第一确定模块,用于在显示目标页面的情况下,确定所述目标页面对应的词向量;A first determining module, configured to determine a word vector corresponding to the target page when the target page is displayed;第二确定模块,用于将所述词向量输入深度学习模型中,确定与所述词向量匹配的目标数据;The second determination module is used to input the word vector into the deep learning model, and determine the target data matched with the word vector;第一显示模块,用于显示所述目标数据。The first display module is used to display the target data.
- 根据权利要求8所述的装置,其中,所述第二确定模块,具体用于:The device according to claim 8, wherein the second determination module is specifically configured to:将所述词向量输入至深度学习模型中,得到第一目标特征向量,所述第一目标特征向量用于表征用户行为;The word vector is input into the deep learning model to obtain a first target feature vector, and the first target feature vector is used to represent user behavior;确定所述第一目标特征向量在数据库中对应的至少一个第二目标特征向量;所述数据库存储有第一特征向量和第二特征向量,所述第一特征向量和所述第二特征向量基于所述深度学习模型生成;Determining at least one second target feature vector corresponding to the first target feature vector in the database; the database stores a first feature vector and a second feature vector, and the first feature vector and the second feature vector are based on The deep learning model is generated;将所述第二目标特征向量对应的数据确定为所述目标数据。The data corresponding to the second target feature vector is determined as the target data.
- 根据权利要求9所述的装置,其中,所述第二确定模块,还具体用于:The device according to claim 9, wherein the second determination module is further specifically configured to:确定所述第一目标特征向量在所述数据库中的存储位置;determining a storage location of the first target feature vector in the database;基于所述存储位置,使用最邻近搜索算法,得到所述第一目标特征向量与第二特征向量之间的欧式距离;Based on the storage location, using a nearest neighbor search algorithm to obtain the Euclidean distance between the first target feature vector and the second feature vector;将所述欧式距离小于或等于预设阈值的第二特征向量确定为所述第二目标特征向量。Determining a second eigenvector whose Euclidean distance is less than or equal to a preset threshold as the second target eigenvector.
- 根据权利要求10所述的装置,其中,所述第一显示模块,具体用于:The device according to claim 10, wherein the first display module is specifically used for:按照欧式距离从小到大的顺序,显示所述第二目标特征向量对应的目标数据。The target data corresponding to the second target feature vector is displayed in ascending order of the Euclidean distance.
- 根据权利要求8所述的装置,其中,所述第一显示模块,还具体用于:The device according to claim 8, wherein the first display module is further specifically configured to:获取所述目标数据对应的数据类型和关键词;Obtain the data type and keywords corresponding to the target data;在所述数据类型为预设类型和/或所述关键词包括预设字段的情况下,在所述目标数据的预设区域显示提醒标识。When the data type is a preset type and/or the keyword includes a preset field, a reminder mark is displayed in a preset area of the target data.
- 根据权利要求8所述的装置,其中,所述装置还包括:The device according to claim 8, wherein the device further comprises:接收模块,用于接收用户对所述目标数据的输入;a receiving module, configured to receive user input on the target data;第二显示模块,用于在所述目标页面对应的应用程序为目标应用程序的情况下,显示提醒信息。The second display module is configured to display reminder information when the application program corresponding to the target page is the target application program.
- 根据权利要求8所述的装置,其中,所述第一确定模块,具体用于:The device according to claim 8, wherein the first determining module is specifically configured to:对所述目标页面进行光学字符识别处理,得到文本信息;performing optical character recognition processing on the target page to obtain text information;对所述文本信息进行关键词提取,得到关键词信息;performing keyword extraction on the text information to obtain keyword information;对所述关键词信息进行词嵌入处理,得到所述词向量。Word embedding processing is performed on the keyword information to obtain the word vector.
- 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,其中,所述程序或指令被所述处理器执行时实现如权利要求1-7中任一项所述的数据推荐方法的步骤。An electronic device, comprising a processor, a memory, and a program or instruction stored in the memory and operable on the processor, wherein the program or instruction is executed by the processor to achieve claim 1 The steps of the data recommendation method described in any one of -7.
- 一种可读存储介质,所述可读存储介质上存储程序或指令,其中,所述程序或指令被处理器执行时实现如权利要求1-7中任一项所述的数据推荐方法的步骤。A readable storage medium, on which a program or instruction is stored, wherein, when the program or instruction is executed by a processor, the steps of the data recommendation method according to any one of claims 1-7 are realized .
- 一种芯片,包括处理器和通信接口,其中,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-7任一项所述的数据推荐方法的步骤。A chip comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the data recommendation method according to any one of claims 1-7 A step of.
- 一种计算机软件产品,其中,所述计算机软件产品被存储在非易失的存储介质中,所述计算机软件产品被至少一个处理器执行时实现如权利要求1-7任一项所述的数据推荐方法的步骤。A computer software product, wherein the computer software product is stored in a non-volatile storage medium, and when the computer software product is executed by at least one processor, the data according to any one of claims 1-7 is realized Steps in the recommended method.
- 一种电子设备,被配置为执行如权利要求1-7任一项所述的数据推荐方法的步骤。An electronic device configured to execute the steps of the data recommendation method according to any one of claims 1-7.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111182460.7 | 2021-10-11 | ||
CN202111182460.7A CN113869063A (en) | 2021-10-11 | 2021-10-11 | Data recommendation method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023061276A1 true WO2023061276A1 (en) | 2023-04-20 |
Family
ID=78998970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/124028 WO2023061276A1 (en) | 2021-10-11 | 2022-10-09 | Data recommendation method and apparatus, electronic device, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113869063A (en) |
WO (1) | WO2023061276A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117421486A (en) * | 2023-12-18 | 2024-01-19 | 杭州金智塔科技有限公司 | Recommendation model updating system and method based on spherical tree algorithm and federal learning |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113869063A (en) * | 2021-10-11 | 2021-12-31 | 维沃移动通信有限公司 | Data recommendation method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649603A (en) * | 2016-11-25 | 2017-05-10 | 北京资采信息技术有限公司 | Webpage text data sentiment classification designated information push method |
CN111444447A (en) * | 2018-12-29 | 2020-07-24 | 北京奇虎科技有限公司 | Content recommendation page display method and device |
CN111723260A (en) * | 2019-03-19 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring recommended content, electronic equipment and readable storage medium |
US20210035244A1 (en) * | 2018-03-27 | 2021-02-04 | Huawei Technologies Co., Ltd. | Scenario-Based Application Recommendation Method and Apparatus |
CN113869063A (en) * | 2021-10-11 | 2021-12-31 | 维沃移动通信有限公司 | Data recommendation method and device, electronic equipment and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407280B (en) * | 2016-08-26 | 2020-02-14 | 合一网络技术(北京)有限公司 | Query target matching method and device |
CN107220386B (en) * | 2017-06-29 | 2020-10-02 | 北京百度网讯科技有限公司 | Information pushing method and device |
CN108228715A (en) * | 2017-12-05 | 2018-06-29 | 深圳市金立通信设备有限公司 | A kind of method, terminal and computer readable storage medium for showing image |
CN109447192A (en) * | 2018-09-04 | 2019-03-08 | 西安艾润物联网技术服务有限责任公司 | Information-pushing method and Related product |
CN109783727A (en) * | 2018-12-24 | 2019-05-21 | 东软集团股份有限公司 | Retrieve recommended method, device, computer readable storage medium and electronic equipment |
CN112395606A (en) * | 2020-11-24 | 2021-02-23 | 维沃移动通信有限公司 | Information display method, device, equipment and storage medium |
-
2021
- 2021-10-11 CN CN202111182460.7A patent/CN113869063A/en active Pending
-
2022
- 2022-10-09 WO PCT/CN2022/124028 patent/WO2023061276A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649603A (en) * | 2016-11-25 | 2017-05-10 | 北京资采信息技术有限公司 | Webpage text data sentiment classification designated information push method |
US20210035244A1 (en) * | 2018-03-27 | 2021-02-04 | Huawei Technologies Co., Ltd. | Scenario-Based Application Recommendation Method and Apparatus |
CN111444447A (en) * | 2018-12-29 | 2020-07-24 | 北京奇虎科技有限公司 | Content recommendation page display method and device |
CN111723260A (en) * | 2019-03-19 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring recommended content, electronic equipment and readable storage medium |
CN113869063A (en) * | 2021-10-11 | 2021-12-31 | 维沃移动通信有限公司 | Data recommendation method and device, electronic equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117421486A (en) * | 2023-12-18 | 2024-01-19 | 杭州金智塔科技有限公司 | Recommendation model updating system and method based on spherical tree algorithm and federal learning |
CN117421486B (en) * | 2023-12-18 | 2024-03-19 | 杭州金智塔科技有限公司 | Recommendation model updating system and method based on spherical tree algorithm and federal learning |
Also Published As
Publication number | Publication date |
---|---|
CN113869063A (en) | 2021-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11157577B2 (en) | Method for searching and device thereof | |
US11120078B2 (en) | Method and device for video processing, electronic device, and storage medium | |
CN107102746B (en) | Candidate word generation method and device and candidate word generation device | |
TWI544350B (en) | Input method and system for searching by way of circle | |
CN110019675B (en) | Keyword extraction method and device | |
WO2023061276A1 (en) | Data recommendation method and apparatus, electronic device, and storage medium | |
CN108121736A (en) | A kind of descriptor determines the method for building up, device and electronic equipment of model | |
US11734370B2 (en) | Method for searching and device thereof | |
EP3175375A1 (en) | Image based search to identify objects in documents | |
WO2023236866A1 (en) | Input method and apparatus, electronic device, and readable storage medium | |
CN111538830B (en) | French searching method, device, computer equipment and storage medium | |
CN108717403B (en) | Processing method and device for processing | |
CN102968266A (en) | Identification method and apparatus | |
WO2023078414A1 (en) | Related article search method and apparatus, electronic device, and storage medium | |
CN110929122B (en) | Data processing method and device for data processing | |
US20160026613A1 (en) | Processing image to identify object for insertion into document | |
US20130230248A1 (en) | Ensuring validity of the bookmark reference in a collaborative bookmarking system | |
WO2022222821A1 (en) | Information display method and apparatus | |
WO2022257883A1 (en) | Presentation method and presentation apparatus | |
WO2022237877A1 (en) | Information processing method and apparatus, and electronic device | |
CN117609443A (en) | Intelligent interaction method, system, terminal, server and medium based on large model | |
CN110020335B (en) | Favorite processing method and device | |
KR20150135042A (en) | Method for Searching and Device Thereof | |
CN112987941B (en) | Method and device for generating candidate words | |
CN112765447B (en) | Data searching method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22880215 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |