CN117891918B - Interactive data management system of Chinese text vectorization model based on AI PaaS platform - Google Patents
Interactive data management system of Chinese text vectorization model based on AI PaaS platform Download PDFInfo
- Publication number
- CN117891918B CN117891918B CN202410070601.3A CN202410070601A CN117891918B CN 117891918 B CN117891918 B CN 117891918B CN 202410070601 A CN202410070601 A CN 202410070601A CN 117891918 B CN117891918 B CN 117891918B
- Authority
- CN
- China
- Prior art keywords
- text
- vocabulary
- paas platform
- vector
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013523 data management Methods 0.000 title claims abstract description 48
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 48
- 239000013598 vector Substances 0.000 claims description 68
- 230000006870 function Effects 0.000 claims description 46
- 238000000034 method Methods 0.000 claims description 30
- 230000015654 memory Effects 0.000 claims description 14
- 230000010354 integration Effects 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000013500 data storage Methods 0.000 claims description 10
- 238000010845 search algorithm Methods 0.000 claims description 10
- 230000005540 biological transmission Effects 0.000 claims description 9
- 230000000007 visual effect Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 244000141353 Prunus domestica Species 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/319—Inverted lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/325—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of intelligent office, in particular to an interactive data management system of a Chinese text vectorization model based on an AI PaaS platform. The system comprises an AI PaaS platform module, a text vectorization module and an interactive data management module; the AI PaaS platform module is used for debugging the AI PaaS platform, dividing the performance of the AI PaaS platform, displaying the performance index of the AI PaaS platform, and integrating an interactive data management function in the AI PaaS platform; the text vectorization module is used for constructing a vocabulary, and vectorizing and storing text information to be stored according to the vocabulary; the interactive data management module is used for constructing an interactive data management function, and a user searches and deletes the text data stored by the interactive data management module.
Description
Technical Field
The invention relates to the technical field of intelligent office, in particular to an interactive data management system of a Chinese text vectorization model based on an AI PaaS platform.
Background
AI PaaS (artificial intelligence platform as a service) is a cloud computing service model that provides a series of tools and services for building, training and deploying artificial intelligence models. The AI PaaS platform aims to simplify the development process of artificial intelligence applications, enabling developers to more easily utilize advanced machine learning and deep learning techniques.
Text vectorization is the process of converting text data into numeric vectors so that a computer can better understand and process text information. The goal of the text vectorization model is to map semantic information in text into a high-dimensional vector space for machine learning and natural language processing tasks.
In the prior art, no technology is provided for constructing a text vectorization model technology in an AI PaaS platform and constructing an interactive data management system.
In view of this, the invention proposes an interactive data management system based on the chinese text vectorization model of the AI PaaS platform.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.
The present invention has been made in view of the above-described problems.
Therefore, the technical problems solved by the invention are as follows: based on an AI PaaS platform, a text vectorization model technology is constructed, and the vectorized text data is managed.
In order to solve the technical problems, the invention provides the following technical scheme:
The interactive data management system based on the Chinese text vectorization model of the AI PaaS platform comprises an AI PaaS platform module, a text vectorization module and an interactive data management module; the AI PaaS platform module comprises a debugging unit, a function integration unit, a data transmission unit and a data storage unit; the text vectorization module comprises a vocabulary unit, a word segmentation unit and a vectorization unit; the interactive data management module comprises a data searching unit and a data adjusting unit.
Preferably, the debugging unit is used for creating a visual interface, distributing the acquired performance of the AI PaaS platform, and displaying the performance index of the AI PaaS platform in real time through the visual interface;
The function integration unit is used for integrating a required function model on the AI PaaS platform; importing the created interactive data management system into an AI PaaS platform through a function integration unit;
The data transmission unit is used for receiving a request instruction sent by the user side to the AI PaaS platform and transmitting feedback data sent by the AI PaaS platform to the user side;
The data storage unit is used for storing vectorized text data transmitted to the AI PaaS platform by a user, a transmitted request instruction and log data generated during the operation of the associated function of the AI PaaS platform.
Preferably, the vocabulary unit is used for constructing a vector table of Chinese vocabulary; generating a unique corresponding vector value from the Chinese vocabulary to be stored, and integrating all vocabulary vector values into a vocabulary; vocabulary V is expressed as: v= { W 1,W2,...,Wn };
The word segmentation unit is used for segmenting the input text; the word segmentation is to divide an input text into characters and words, convert and store the words into data form according to vector values of a vocabulary, and if the input text is a character, the word segmentation is expressed as follows: t fi=[C1,C2,...,Cn ]; if the input text is vocabulary, the text is expressed as: t ci=[Z1,Z2,...,Zn ]; vector space dimensions are added to the vector values for each word and vocabulary to represent the position of the characters and vocabulary in the text.
Preferably, the vectorization unit is configured to vectorize and store a text, and generate an initial identification vector S when a new text is input; when the text input is completed, an end identification vector E is generated; the stored vector data for a single text is represented as: x i=[S,Tfi,Tci, E ];
the initial identification vector S stores name information of the vectorized text, storage position information of the vectorized text and time information for beginning to store the vectorized text; and the end identification vector E stores the time information for ending storage of the vectorized text, the space size occupation information of the vectorized text and the number statistical information of characters and words of the vectorized text.
Preferably, the data searching unit adopts a searching algorithm for searching the stored vectorized text; the search algorithm comprises a vocabulary search algorithm and a sentence search algorithm;
The steps of the constructed vocabulary searching algorithm are as follows:
For vectorized text data, calculating word frequency of each word in each text, and constructing a word-text matrix: matrix (X i,tj), where t j represents the frequency of occurrence of the vocabulary in the vectorized text X i;
For each vocabulary t j, a text list in (t j) is created containing the vocabulary t i, expressed as: in (t j)={Xi … }, expressed as the vocabulary t i appears in the text list;
For each text X i, the text weights are calculated using the TF-IDF method:
TF-IDF(tj,di)=TF(tj,Xi)×IDF(tj)
Wherein TF (t j,Xi) represents the word frequency of the word t j in the text X i and IDF (t j) represents the inverse text frequency; the calculation formula of the inverse text IDF (t j) is:
Wherein N is the total number of texts, N is the number of texts comprising the vocabulary t j;
And finding a text list containing the vocabulary t j through the inverted index for the vocabulary t j of the query, and sequencing and outputting the text list according to the text weight.
Preferably, the constructed sentence searching algorithm comprises the following steps:
designing a plurality of hash functions, and mapping the text vector into different hash buckets through the plurality of hash functions;
For an input query sentence, vectorizing the input sentence according to vocabulary data to generate a query vector;
For the query vector, mapping the query vector into a corresponding hash bucket using the constructed hash function;
Searching similar text vectors in a hash bucket mapped with the hash value of the query vector;
calculating the distance value between each similar text vector and the query vector;
and selecting the text vector with the nearest distance value for output.
Preferably, the data adjustment unit comprises adjusting the stored text content within the authority range of the user based on the identity and the authority of the user;
the adjustment comprises the steps of modifying and deleting the stored text content, and generating new vectorized text data according to the new text content after the user modifies and deletes the text content;
when a user modifies and deletes the text data stored by the user, a log file for modifying and deleting the text data is generated, and the log file is stored in a data storage unit of the AI PaaS platform.
A method of an interactive data management system based on a chinese text vectorization model of an AI PaaS platform, the method comprising,
Debugging an AI PaaS platform, dividing the performance of the AI PaaS platform, displaying the performance index of the AI PaaS platform, and integrating an interactive data management function in the AI PaaS platform;
constructing a vocabulary, and vectorizing and storing the text information to be stored according to the vocabulary;
and constructing an interactive data management function, and searching and deleting the text data stored by the user.
A computer device comprising a memory and a processor, said memory storing a computer program, said processor implementing the steps of the AIPaaS-station based elastiscearch text vectorization search method when said computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the AIPaaS platform based elastsearch text vectorization search method.
The invention has the beneficial effects that: the invention is based on the AI PaaS platform, takes the AI PaaS platform as a base, develops an interactive data management system of the text vectorization model, and can simultaneously enable multiple users to manage text data.
Based on cloud service characteristics of the AI PaaS platform, the method greatly reduces the deployment cost and the operation cost of the system, optimizes the interactive function of text data management through a self-built management algorithm, improves the convenience of a user in managing the text data stored by the user, and improves the privacy and the safety of the user in storing and accessing the text data through the function setting of corresponding user authority and data safety.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
FIG. 1 is a block diagram of an interactive data management system of a Chinese text vectorization model based on an AI PaaS platform;
FIG. 2 is a flow chart of an interactive data management method based on a text vectorization model of an AI PaaS platform;
FIG. 3 is a schematic diagram of an electronic device according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a computer-readable storage medium according to one embodiment of the present application.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present invention have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1, a first embodiment of the present invention provides an interactive data management system based on a chinese text vectorization model of an AI PaaS platform.
Specifically, the system comprises an AI PaaS platform module, a text vectorization module and an interactive data management module.
The AI PaaS platform module comprises a debugging unit, a function integration unit, a data transmission unit and a data storage unit; the AI PaaS platform module is used for debugging the AI PaaS platform, dividing the performance of the AI PaaS platform, displaying the performance index of the AI PaaS platform, and integrating the interactive data management function in the AI PaaS platform.
The text vectorization module comprises a vocabulary unit, a word segmentation unit and a vectorization unit; the text vectorization module is used for constructing a vocabulary, and vectorizing and storing the text information to be stored according to the vocabulary.
The interactive data management module comprises a data searching unit and a data adjusting unit; the interactive data management module is used for constructing an interactive data management function, and a user searches and prunes the text data stored by the user.
The debugging unit is used for creating a visual interface, distributing the acquired performance of the AI PaaS platform and displaying the performance index of the AI PaaS platform in real time through the visual interface.
The function integration unit is used for integrating a required function model on the AI PaaS platform; the created interactive data management system is imported into the AI PaaS platform through the function integration unit, and the function integration unit can be used for integrating other functions and expanding the functions of the constructed AI PaaS platform module.
Through the function integration unit, the trained model and the developed system can be deployed in the AI PaaS platform.
The data transmission unit is used for receiving a request instruction sent by the user side to the AI PaaS platform and transmitting feedback data sent by the AI PaaS platform to the user side.
The transmission unit encodes and decodes the data to be transmitted, supports different transmission protocols, selects a corresponding transmission protocol based on actual communication requirements, and encrypts and transmits the transmitted data.
The data storage unit is used for storing vectorized text data transmitted to the AI PaaS platform by a user, a transmitted request instruction and log data generated during the operation of the associated function of the AI PaaS platform.
The data storage unit is a database, and the performance of the database is configured according to actual requirements.
The vocabulary unit is used for constructing a vector table of Chinese vocabulary; generating a unique corresponding vector value from the Chinese vocabulary to be stored, and integrating all vocabulary vector values into a vocabulary; vocabulary V is expressed as: v= { W 1,W2,...,Wn }.
The vocabulary list is pre-generated according to the text data range which needs to be stored, when the vocabulary outside the range appears, the vocabulary outside the range is immediately vector-generated, and the vocabulary is recorded in the vocabulary list.
The word segmentation unit is used for segmenting the input text; the word segmentation is to divide an input text into characters and words, convert and store the words into data form according to vector values of a vocabulary, and if the input text is a character, the word segmentation is expressed as follows: t fi=[C1,C2,...,Cn ]; if the input text is vocabulary, the text is expressed as: t ci=[Z1,Z2,...,Zn ]; vector space dimensions are added to the vector values for each word and vocabulary to represent the position of the characters and vocabulary in the text.
The vectorization unit is used for vectorizing and storing texts, and generating an initial identification vector S when a new text is input; when the text input is completed, an end identification vector E is generated; the stored vector data for a single text is represented as: x i=[S,Tfi,Tci, E ].
The initial identification vector S stores name information of the vectorized text, storage position information of the vectorized text and time information for beginning to store the vectorized text; and the end identification vector E stores the time information for ending storage of the vectorized text, the space size occupation information of the vectorized text and the number statistical information of characters and words of the vectorized text.
The data searching unit adopts a searching algorithm and is used for searching the stored vectorized text; the search algorithm comprises a vocabulary search algorithm and a sentence search algorithm.
The steps of the constructed vocabulary searching algorithm are as follows:
For vectorized text data, calculating word frequency of each word in each text, and constructing a word-text matrix: matrix (X i,tj), where t j represents the frequency of occurrence of the vocabulary in the vectorized text X i;
For each vocabulary t j, a text list in (t j) is created containing the vocabulary t i, expressed as: in (t j)={Xi … }, expressed as the vocabulary t j appears in the text list;
For each text X i, the text weights are calculated using the TF-IDF method:
TF-IDF(tj,di)=TF(tj,Xi)×IDF(tj)
Wherein TF (t j,Xi) represents the word frequency of the word t j in the text X i and IDF (t j) represents the inverse text frequency;
the calculation formula of the inverse text IDF (t j) is:
Wherein N is the total number of texts, N is the number of texts comprising the vocabulary t j;
And finding a text list containing the vocabulary t j through the inverted index for the vocabulary t j of the query, and sequencing and outputting the text list according to the text weight.
The constructed sentence searching algorithm comprises the following steps:
designing a plurality of hash functions, and mapping the text vector into different hash buckets through the plurality of hash functions;
For an input query sentence, vectorizing the input sentence according to vocabulary data to generate a query vector;
For the query vector, mapping the query vector into a corresponding hash bucket using the constructed hash function;
Searching similar text vectors in a hash bucket mapped with the hash value of the query vector;
calculating the distance value between each similar text vector and the query vector;
and selecting the text vector with the nearest distance value for output.
The data adjustment unit comprises the step of adjusting the stored text content within the authority range of the user based on the identity and the authority of the user.
The adjustment includes modifying and deleting the stored text content and generating new vectorized text data from the new text content after the user modifies and deletes the text content.
When a user modifies and deletes the text data stored by the user, a log file for modifying and deleting the text data is generated, and the log file is stored in a data storage unit of the AI PaaS platform.
Example 2
The second embodiment of the invention provides an interactive data management method based on a text vectorization model of an AI PaaS platform.
S1: and debugging the AI PaaS platform, dividing the performance of the AI PaaS platform, displaying the performance index of the AI PaaS platform, and integrating the interactive data management function in the AI PaaS platform.
S101: creating a visual interface, distributing the acquired performance of the AI PaaS platform, and displaying the performance index of the AI PaaS platform in real time through the visual interface.
S102: integrating a required functional model on an AI PaaS platform; and importing the created interactive data management system into the AI PaaS platform through the function integration unit.
S103: and receiving a request instruction sent by the user terminal to the AI PaaS platform and transmitting feedback data sent by the AI PaaS platform to the user terminal.
S104: and storing the vectorized text data transmitted to the AI PaaS platform by the user, the transmitted request instruction and log data generated during the working of the associated function of the AI PaaS platform.
S2: and constructing a vocabulary, and vectorizing and storing the text information to be stored according to the vocabulary.
S201: constructing a vector table of Chinese vocabulary; and generating a unique corresponding vector value from the Chinese vocabulary to be stored, and integrating all the vocabulary vector values into a vocabulary list.
S202: the input text is divided into characters and words, and the words are converted and stored into a data form according to the vector values of the vocabulary.
S203: vectorizing and storing the text, and generating an initial identification vector S when a new text is input; when the text input is completed, an end identification vector E is generated.
S3: and constructing an interactive data management function, and searching and deleting the text data stored by the user.
S301: a search algorithm is employed for searching the stored text content.
S302: and modifying and deleting the stored text content, and generating new vectorized text data according to the new text content after the user modifies and deletes the text content.
Example 3
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 3, an electronic device 500 is also provided in accordance with yet another aspect of the present application. The electronic device 500 may include one or more processors and one or more memories. Wherein the memory has stored therein computer readable code which, when executed by the one or more processors, can perform the multi-source heterogeneous data driven intelligent manufacturing decision method as described above.
The method or system according to embodiments of the application may also be implemented by means of the architecture of the electronic device shown in fig. 3. As shown in fig. 3, the electronic device 500 may include a bus 501, one or more CPUs 502, a Read Only Memory (ROM) 503, a Random Access Memory (RAM) 504, a communication port 505 connected to a network, an input/output component 506, a hard disk 507, and the like. A storage device in electronic device 500, such as ROM503 or hard disk 507, may store the multi-source heterogeneous data driven intelligent manufacturing decision method provided by the present application. The intelligent manufacturing decision method driven by the multi-source heterogeneous data comprises the following steps: debugging an AI PaaS platform, dividing the performance of the AI PaaS platform, displaying the performance index of the AI PaaS platform, and integrating an interactive data management function in the AI PaaS platform; constructing a vocabulary, and vectorizing and storing the text information to be stored according to the vocabulary;
And constructing an interactive data management function, and searching and deleting the text data stored by the user. Further, the electronic device 500 may also include a user interface 508. Of course, the architecture shown in fig. 3 is merely exemplary, and one or more components of the electronic device shown in fig. 3 may be omitted as may be practical in implementing different devices.
Example 4
FIG. 4 is a schematic diagram of a computer-readable storage medium according to one embodiment of the present application. As shown in fig. 4, is a computer-readable storage medium 600 according to one embodiment of the application. Computer readable storage medium 600 has stored thereon computer readable instructions. When the computer readable instructions are executed by the processor, the multi-source heterogeneous data driven intelligent manufacturing decision method according to the embodiments of the present application described with reference to the above figures may be performed. Storage medium 600 includes, but is not limited to, for example, volatile memory and/or nonvolatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like.
In addition, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, the present application provides a non-transitory machine-readable storage medium storing machine-readable instructions executable by a processor to perform instructions corresponding to the method steps provided by the present application, such as: debugging an AI PaaS platform, dividing the performance of the AI PaaS platform, displaying the performance index of the AI PaaS platform, and integrating an interactive data management function in the AI PaaS platform; constructing a vocabulary, and vectorizing and storing the text information to be stored according to the vocabulary; and constructing an interactive data management function, and searching and deleting the text data stored by the user.
The methods and apparatus, devices of the present application may be implemented in numerous ways. For example, the methods and apparatus, devices of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present application are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present application may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.
In addition, in the foregoing technical solutions provided in the embodiments of the present application, parts consistent with implementation principles of corresponding technical solutions in the prior art are not described in detail, so that redundant descriptions are avoided.
The purpose, technical scheme and beneficial effects of the invention are further described in detail in the detailed description. It is to be understood that the above description is only of specific embodiments of the present invention and is not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (5)
1. The interactive data management system of the Chinese text vectorization model based on the AI PaaS platform is characterized by comprising an AI PaaS platform module, a text vectorization module and an interactive data management module;
The AI PaaS platform module comprises a debugging unit, a function integration unit, a data transmission unit and a data storage unit; the AI PaaS platform module is used for debugging the AI PaaS platform, dividing the performance of the AI PaaS platform, displaying the performance index of the AI PaaS platform, and integrating an interactive data management function in the AI PaaS platform;
The text vectorization module comprises a vocabulary unit, a word segmentation unit and a vectorization unit; the text vectorization module is used for constructing a vocabulary, dividing input text words and vectorizing and storing text information to be stored according to the vocabulary;
The interactive data management module comprises a data searching unit and a data adjusting unit; the interactive data management module is used for constructing an interactive data management function, and a user searches and prunes the text data stored by the user;
the debugging unit is used for creating a visual interface, distributing the acquired performance of the AI PaaS platform and displaying the performance index of the AI PaaS platform in real time through the visual interface;
The function integration unit is used for integrating a required function model on the AI PaaS platform; importing the created interactive data management system into an AI PaaS platform through a function integration unit;
The data transmission unit is used for receiving a request instruction sent by the user side to the AI PaaS platform and transmitting feedback data sent by the AI PaaS platform to the user side;
the data storage unit is used for storing vectorized text data transmitted to the AI PaaS platform by a user, a transmitted request instruction and log data generated during the operation of the associated function of the AI PaaS platform;
the vocabulary unit is used for constructing a vector table of Chinese vocabulary; generating a unique corresponding vector value from the Chinese vocabulary to be stored, and integrating all vocabulary vector values into a vocabulary; vocabulary V is expressed as: ;
the word segmentation unit is used for segmenting the input text; the word segmentation is to divide an input text into characters and words, convert and store the words into data form according to vector values of a vocabulary, and if the input text is a character, the word segmentation is expressed as follows: ; if the input text is vocabulary, the text is expressed as: ; increasing vector space dimension for each word and vocabulary vector value to represent the position of the character and vocabulary in the text;
The vectorization unit is used for vectorizing and storing texts, and generating an initial identification vector S when a new text is input; when the text input is completed, an end identification vector E is generated; the stored vector data for a single text is represented as: ;
The initial identification vector S stores name information of the vectorized text, storage position information of the vectorized text and time information for beginning to store the vectorized text; the end identification vector E stores time information for ending storage of the vectorized text, space size occupation information of the vectorized text and quantity statistical information of characters and words of the vectorized text;
the data searching unit adopts a searching algorithm and is used for searching the stored vectorized text; the search algorithm comprises a vocabulary search algorithm and a sentence search algorithm;
The steps of the constructed vocabulary searching algorithm are as follows:
for vectorized text data, calculating word frequency of each word in each text, and constructing a word-text matrix: Wherein Representing vocabulary in vectorized textIs a frequency of occurrence in the first and second embodiments;
For each vocabulary Creating a containing vocabularyText list of (c)Expressed as: expressed as words Appear in the text list;
For each text Text weights were calculated using TF-IDF:
;
wherein, Representation vocabularyIn textIs used for the word frequency of the word,Representing the inverse text frequency;
Reverse text The calculation formula of (2) is as follows:
;
Wherein N is the total text number, N is the word containing Is a text number of (a);
Vocabulary of queries Finding the containing vocabulary through the inverted indexSequencing according to the text weight and outputting;
The constructed sentence searching algorithm comprises the following steps:
designing a plurality of hash functions, and mapping the text vector into different hash buckets through the plurality of hash functions;
For an input query sentence, vectorizing the input sentence according to vocabulary data to generate a query vector;
For the query vector, mapping the query vector into a corresponding hash bucket using the constructed hash function;
Searching similar text vectors in a hash bucket mapped with the hash value of the query vector;
calculating the distance value between each similar text vector and the query vector;
and selecting the text vector with the nearest distance value for output.
2. The interactive data management system based on the chinese text vectorization model of the AI PaaS platform as claimed in claim 1, wherein the data adjustment unit comprises adjusting the stored text content within the self authority range based on the identity and authority of the user;
the adjustment comprises the steps of modifying and deleting the stored text content, and generating new vectorized text data according to the new text content after the user modifies and deletes the text content;
when a user modifies and deletes the text data stored by the user, a log file for modifying and deleting the text data is generated, and the log file is stored in a data storage unit of the AI PaaS platform.
3. A method of an interactive data management system based on a Chinese text vectorization model of an AI PaaS platform as claimed in any of claims 1 and 2, wherein the method comprises,
Debugging an AI PaaS platform, dividing the performance of the AI PaaS platform, displaying the performance index of the AI PaaS platform, and integrating an interactive data management function in the AI PaaS platform;
constructing a vocabulary, and vectorizing and storing the text information to be stored according to the vocabulary;
and constructing an interactive data management function, and searching and deleting the text data stored by the user.
4. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of claim 3 when executing the computer program.
5. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method as claimed in claim 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410070601.3A CN117891918B (en) | 2024-01-17 | 2024-01-17 | Interactive data management system of Chinese text vectorization model based on AI PaaS platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410070601.3A CN117891918B (en) | 2024-01-17 | 2024-01-17 | Interactive data management system of Chinese text vectorization model based on AI PaaS platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117891918A CN117891918A (en) | 2024-04-16 |
CN117891918B true CN117891918B (en) | 2024-09-03 |
Family
ID=90645437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410070601.3A Active CN117891918B (en) | 2024-01-17 | 2024-01-17 | Interactive data management system of Chinese text vectorization model based on AI PaaS platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117891918B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250526A (en) * | 2016-08-05 | 2016-12-21 | 浪潮电子信息产业股份有限公司 | A kind of text class based on content and user behavior recommends method and apparatus |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10402090B1 (en) * | 2016-09-30 | 2019-09-03 | EMC IP Holding Company LLC | Data service protection for cloud management platforms |
US11423069B2 (en) * | 2018-09-19 | 2022-08-23 | Servicenow, Inc. | Data structures for efficient storage and updating of paragraph vectors |
US11321312B2 (en) * | 2019-01-14 | 2022-05-03 | ALEX—Alternative Experts, LLC | Vector-based contextual text searching |
CN112257421B (en) * | 2020-12-21 | 2021-04-23 | 完美世界(北京)软件科技发展有限公司 | Nested entity data identification method and device and electronic equipment |
CN112817916B (en) * | 2021-02-07 | 2023-03-31 | 中国科学院新疆理化技术研究所 | Data acquisition method and system based on IPFS |
CN114706950A (en) * | 2022-03-30 | 2022-07-05 | 易薪路网络科技(上海)有限公司 | Long text data retrieval method, device, equipment and storage medium |
CN117056465A (en) * | 2023-08-22 | 2023-11-14 | 上海极目银河数字科技有限公司 | Vector searching method, system, electronic device and storage medium |
-
2024
- 2024-01-17 CN CN202410070601.3A patent/CN117891918B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250526A (en) * | 2016-08-05 | 2016-12-21 | 浪潮电子信息产业股份有限公司 | A kind of text class based on content and user behavior recommends method and apparatus |
Non-Patent Citations (1)
Title |
---|
"铁路工程文档平台关键技术研究与应用";解亚龙等;铁道科学与工程学报;20200815;第17卷(第8期);第2142-2151页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117891918A (en) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021217935A1 (en) | Method for training question generation model, question generation method, and related device | |
CN112765306B (en) | Intelligent question-answering method, intelligent question-answering device, computer equipment and storage medium | |
US11334692B2 (en) | Extracting a knowledge graph from program source code | |
US10303689B2 (en) | Answering natural language table queries through semantic table representation | |
US11409754B2 (en) | NLP-based context-aware log mining for troubleshooting | |
CN113127506B (en) | Target query statement construction method and device, storage medium and electronic device | |
US20220058349A1 (en) | Data processing method, device, and storage medium | |
CN112800769B (en) | Named entity recognition method, named entity recognition device, named entity recognition computer equipment and named entity recognition storage medium | |
CN112925954B (en) | Method and device for querying data in graph database | |
CN109460499A (en) | Target search word generation method and device, electronic equipment, storage medium | |
CN109902290A (en) | A kind of term extraction method, system and equipment based on text information | |
CN117271736A (en) | Question-answer pair generation method and system, electronic equipment and storage medium | |
CN106570153A (en) | Data extraction method and system for mass URLs | |
CN110019714A (en) | More intent query method, apparatus, equipment and storage medium based on historical results | |
CN112559760B (en) | CPS (cyber physical system) resource capacity knowledge graph construction method for text description | |
CN117891918B (en) | Interactive data management system of Chinese text vectorization model based on AI PaaS platform | |
KR102666248B1 (en) | Prompt generation device for generating training data of generative deep learning model | |
CN104517062A (en) | Method and device for sub authority document management based on document object model | |
WO2023103914A1 (en) | Text sentiment analysis method and device, and computer-readable storage medium | |
CN106570152A (en) | Mobile phone number volume extracting method and system | |
KR102280028B1 (en) | Method for managing contents based on chatbot using big-data and artificial intelligence and apparatus for the same | |
CN110716994B (en) | Retrieval method and device supporting heterogeneous geographic data resource retrieval | |
CN111783465A (en) | Named entity normalization method, system and related device | |
CN112560466A (en) | Link entity association method and device, electronic equipment and storage medium | |
KR102594926B1 (en) | Security information report and document creation guide system through security information manual and case recommendation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |