CN118486310B

CN118486310B - Vehicle-mounted voice guiding method and device and vehicle-mounted terminal

Info

Publication number: CN118486310B
Application number: CN202410947003.XA
Authority: CN
Inventors: 胡昌菊; 刘楚雄; 宋亮; 钟远健; 王天祥; 吴晓亚
Original assignee: Chengdu Seres Technology Co Ltd
Current assignee: Chengdu Seres Technology Co Ltd
Priority date: 2024-07-16
Filing date: 2024-07-16
Publication date: 2024-09-20
Anticipated expiration: 2044-07-16
Also published as: CN118486310A

Abstract

The application provides a vehicle-mounted voice guiding method, a device and a vehicle-mounted terminal, wherein the method comprises the following steps: according to the scene in the vehicle, according to recommendation rules corresponding to the scene in the vehicle, an optimal frequent set is determined in a pre-built frequent set database, semantic intention corresponding to the optimal frequent set is determined to be the optimal semantic intention, target text data is obtained through the optimal semantic intention and the pre-built intention database, the target text data is converted into a voice instruction, the voice instruction is sent to a vehicle-mounted terminal so as to guide a target user, different recommendation methods are established for different scenes through voice guidance of multiple scenes, personalized vehicle-mounted function recommendation is carried out on the target user, and therefore vehicle-mounted functions which accord with actual intention of the user are recommended for different use scenes, and interaction efficiency and use comfort of the user are effectively improved.

Description

Vehicle-mounted voice guiding method and device and vehicle-mounted terminal

Technical Field

The invention relates to the technical field of vehicle-mounted voice recognition, in particular to a vehicle-mounted voice guiding method and device and a vehicle-mounted terminal.

Background

With the continuous development and intelligent progress of vehicle technology, an on-board voice interaction system becomes an important component in modern automobiles. The vehicle-mounted voice interaction system can realize voice interaction between a driver and a vehicle, and provide functions such as navigation, music playing, phone call and the like. In the vehicle-mounted voice interaction system, voice guidance serves as an important component to help a user to quickly get up, guide the user to find and know functions, improve user experience and the like.

However, the processing method of the existing vehicle-mounted voice interaction system for voice guidance mainly comprises the following steps: the random strategy is adopted to extract the data in the pre-constructed recommendation library for guiding, and the guiding data has high randomness, so that a user can be confused or overwhelmed. Such an irrelevant property may affect the user's trust in the product and the use experience. Also, completely random speech guidance may result in lower conversion rates.

Disclosure of Invention

The application provides a vehicle-mounted voice guiding method and device and a vehicle-mounted terminal, and aims to solve the technical problems.

The embodiment of the application provides a vehicle-mounted voice guiding method, which is characterized by comprising the following steps: according to recommendation rules corresponding to the scene in the vehicle after awakening, determining an optimal frequent set meeting the actual intention of a user in a pre-built frequent set database, and determining the semantic intention corresponding to the optimal frequent set as the optimal semantic intention, wherein the frequent set database comprises a plurality of frequent sets, and the frequent sets are alternative semantic intentions which are larger than a preset support threshold; obtaining target text data through the optimal semantic intention and a pre-constructed intention database, converting the target text data into a voice instruction, and sending the voice instruction to a vehicle-mounted terminal so as to guide a target user, wherein the intention database comprises a corresponding relation between the text data and the semantic intention.

In one embodiment of the present application, determining the best frequent set that meets the actual intent of the user includes: determining an in-vehicle scene with the use times smaller than a first preset times threshold as a low-frequency scene, and determining an in-vehicle scene with the use times larger than or equal to the first preset times threshold as a high-frequency scene; and determining a recommended scene based on the low-frequency scene and the high-frequency scene, and determining an optimal frequent set meeting the actual intention of the user in a pre-built frequent set database based on the recommended scene, wherein the recommended scene comprises a first recommended scene, a second recommended scene and a third recommended scene.

In an embodiment of the present application, determining a recommended scenario based on the low frequency scenario and the high frequency scenario, and determining an optimal frequent set according to actual intention of a user in a pre-built frequent set database based on the recommended scenario includes: when the in-vehicle scene is identified as a low-frequency scene and the identification result of the voice to be identified is invalid voice data, determining the in-vehicle scene as a first recommended scene; or when the in-vehicle scene is identified as a low-frequency scene and voice input is not identified, determining the in-vehicle scene as a first recommended scene; under the first recommendation scene, combining historical data, and searching a combined frequent set in a frequent set database according to a first recommendation rule to serve as an optimal frequent set, wherein the historical data comprises a use record of the frequent set; the combined frequent set is comprised of at least two frequent sets.

In an embodiment of the present application, determining a recommended scenario based on the low frequency scenario and the high frequency scenario, and determining an optimal frequent set according to actual intention of a user in a pre-built frequent set database based on the recommended scenario includes: when the in-vehicle scene is identified as a high-frequency scene and the identification result of the voice to be identified is invalid voice data, determining the in-vehicle scene as a second recommended scene; or when the in-vehicle scene is identified as a high-frequency scene and voice input is not identified, determining the in-vehicle scene as a second recommended scene; and under the second recommendation scene, combining the historical data, and determining a single frequent set as an optimal frequent set in the frequent set database according to a second preset recommendation rule.

In an embodiment of the present application, determining a recommended scenario based on the low frequency scenario and the high frequency scenario, and determining an optimal frequent set according to actual intention of a user in a pre-built frequent set database based on the recommended scenario includes: after semantic recognition is carried out on voice data to be recognized to obtain initial semantic intention, if voice input is not recognized in a first preset time period, determining that the in-vehicle scene is a third recommended scene; and under the third recommendation scene, determining a combined frequent set or a single frequent set as the optimal frequent set in the frequent set database according to a third preset recommendation rule based on the initial semantic intention and the historical data.

In one embodiment of the present application, determining the best frequent set according to the first recommendation rule includes: counting the co-occurrence times of the combined frequent sets in the frequent set database to obtain a first statistical result, and determining the combined frequent set with the highest co-occurrence times in the first statistical result as the optimal frequent set, wherein the co-occurrence times of the combined frequent set comprise the co-occurrence times of all the frequent sets in the combined frequent set; when a plurality of combined frequent sets with the same co-occurrence frequency exist in the first statistical result, counting the use frequency of the combined frequent sets with the same co-occurrence frequency in the historical data to obtain a second statistical result, and determining the combined frequent set with the highest use frequency in the second statistical result as an optimal frequent set; and if the combined frequent sets with the same co-occurrence times are the same in the use times in the historical data, randomly determining the optimal frequent set in the combined frequent sets with the same co-occurrence times.

In an embodiment of the present application, determining the best single frequent set in the frequent set database according to a second preset recommendation rule includes: matching the use field corresponding to the second recommended scene based on the mapping relation between the scene in the vehicle and the semantic intention field, and marking the use field as a target use field; counting the support degree of all single frequent sets in the target use field to obtain a third statistical result, and determining the single frequent set with the highest support degree in the third statistical result as the optimal frequent set, wherein the support degree of the single frequent set comprises the frequency of occurrence of the single frequent set; if a plurality of single frequent sets with the same support degree exist in the third statistical result, counting the use times of the single frequent sets with the same support degree in the historical data to obtain a fourth statistical result, and determining the single frequent set with the highest use times in the fourth statistical result as the optimal frequent set; and if the single frequent sets with the same support degree have the same use times in the historical data, randomly determining the optimal frequent set in the single frequent sets with the same support degree.

In an embodiment of the present application, determining, based on the initial semantic intent, the best frequent set in the frequent set database according to a third preset recommendation rule includes: determining a frequent set corresponding to the initial semantic intention as a special frequent set, and extracting a single frequent set associated with the special frequent set from historical data as an associated frequent set; calculating the probability of the special frequent set in the historical data, and marking the probability as a first probability; calculating the probability of the simultaneous occurrence of the special frequent set and each associated frequent set, and marking the probability as a second probability to obtain a plurality of second probabilities; determining the ratio of the first probability to the second probability as a confidence coefficient to obtain a plurality of confidence coefficients; and determining the associated frequent set with the highest confidence as the optimal frequent set.

In one embodiment of the present application, the process of constructing the frequent set database includes: acquiring voice data to be recognized, converting the voice data to be recognized into text data, and performing semantic intention recognition on the text data to obtain a plurality of alternative semantic intention; calculating the support degree of each alternative semantic intention, classifying the alternative semantic intention which is larger than or equal to a preset support degree threshold value into a single frequent set, storing all the single frequent sets into a frequent set database, and classifying the alternative semantic intention which is smaller than the preset support degree threshold value into a non-frequent set; initializing a blank matrix, wherein matrix behaviors of the blank matrix are the single frequent set, a matrix array of the blank matrix is also the single frequent set, and the matrix rows are combined with the single frequent set in the matrix array to obtain a plurality of combined frequent sets; and counting the co-occurrence times of the combined frequent sets, and storing the combined frequent sets which are larger than a preset co-occurrence time threshold value in a frequent set database, wherein the co-occurrence times of the combined frequent sets comprise the co-occurrence times of all the frequent sets in the combined frequent sets.

In one embodiment of the present application, the process of building the intent database includes: collecting sample voice data in a second preset time period and semantic intention corresponding to the sample voice data, and generating a semantic analysis log; periodically extracting a plurality of text data corresponding to the same semantic intention in the semantic analysis log at intervals of a third preset time period, counting the use times of all the text data, sequencing the use times of the text data according to the sequence from high to low to obtain a text data sequence, taking the text data with the preset quantity in the text data sequence as high-frequency text data, and storing the high-frequency text data into an intention database.

The vehicle-mounted voice guiding device provided by the embodiment of the application is characterized by comprising the following components: the optimal frequent set determining module is used for determining an optimal frequent set which accords with the actual intention of a user from a pre-constructed frequent set database according to a recommendation rule corresponding to the scene in the vehicle after awakening, determining the semantic intention corresponding to the optimal frequent set as an optimal semantic intention, wherein the frequent set database comprises a plurality of frequent sets which are alternative semantic intentions greater than a preset support threshold; the voice guiding module is used for obtaining target text data through the optimal semantic intention and a pre-constructed intention database, converting the target text data into voice instructions and then sending the voice instructions to the vehicle-mounted terminal so as to guide a target user, wherein the intention database comprises the corresponding relation between the text data and the semantic intention.

The embodiment of the application provides a vehicle-mounted terminal, which comprises a processor, a memory and a communication bus, wherein the processor is used for storing information; the communication bus is used for connecting the processor and the memory; the processor is configured to execute the computer program stored in the memory, so as to implement the above-mentioned vehicle-mounted voice guidance method.

The application has the beneficial effects that: according to the scene in the vehicle, according to recommendation rules corresponding to the scene in the vehicle, an optimal frequent set is determined in a pre-built frequent set database, semantic intentions corresponding to the optimal frequent set are determined to be optimal semantic intentions, the frequent set database comprises a plurality of frequent sets, the frequent sets are alternative semantic intentions which are larger than a preset support threshold, target text data are obtained through the optimal semantic intentions and the pre-built intent database, the target text data are converted into voice instructions, the voice instructions are sent to a vehicle-mounted terminal so as to guide a target user, different recommendation methods are established for different scenes through voice guidance of multiple scenes, personalized vehicle-mounted function recommendation is carried out for the target user, and accordingly vehicle-mounted functions which accord with actual intentions of the user are recommended for different use scenes, interaction efficiency and use comfort of the user are effectively improved, and user experience is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

FIG. 1 is a flow chart of an in-vehicle voice guidance method shown in an exemplary embodiment of the application;

FIG. 2 is a block diagram of an in-vehicle voice guidance apparatus shown in an exemplary embodiment of the application;

fig. 3 is a schematic structural view of an in-vehicle terminal according to an exemplary embodiment of the present application.

Detailed Description

Further advantages and effects of the present application will become readily apparent to those skilled in the art from the disclosure herein, by referring to the accompanying drawings and the preferred embodiments. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.

It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present application, it will be apparent, however, to one skilled in the art that embodiments of the present application may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present application.

Embodiments of the present application respectively propose a vehicle-mounted voice guidance method, a vehicle-mounted voice guidance device, and a vehicle-mounted terminal, and these embodiments will be described in detail below.

Referring to fig. 1, fig. 1 shows a flowchart of an in-vehicle voice guidance method according to an embodiment of the present application. As shown in fig. 1, the method at least includes steps S110 to S140, and is described in detail as follows:

Step S110, determining an optimal frequent set meeting the actual intention of a user from a pre-constructed frequent set database according to a recommendation rule corresponding to the scene in the vehicle after awakening, wherein the frequent set database comprises a plurality of frequent sets, and the frequent sets are candidate semantic intention larger than a preset support threshold.

In one implementation of the application, the voice guidance function is awakened by the vehicle-mounted voice assistant based on preset voice data, e.g., the preset voice data is "small race". After waking up the voice guidance function, the current scene information is determined through topActivity (the current top-level application at the vehicle side). The voice guiding function is awakened through the keyword voice data, so that the accurate response of the vehicle-mounted voice assistant is ensured, and the voice guiding function is accurately entered.

In one embodiment of the present application, the on-board voice assistant will immediately begin recording the user's voice data after it wakes up. And analyzing the voice data by using a voice recognition technology through a vehicle-mounted voice assistant to obtain a semantic analysis result. This conversion process ensures accurate communication of voice information, providing a reliable text basis for subsequent intent database and frequent set database construction.

Step S120, determining the semantic intent corresponding to the best frequent set as the best semantic intent.

Step S130, obtaining target text data through the optimal semantic intention and a pre-constructed intention database, wherein the intention database comprises the corresponding relation between the text data and the semantic intention.

And step S140, converting the target text data into a voice command and then sending the voice command to the vehicle-mounted terminal so as to guide the target user.

In one embodiment of the application, after obtaining the optimal semantic intention, the cloud end searches 3 pieces of high-frequency text data in a preset intention database through the optimal semantic intention, randomly selects 1 piece of high-frequency text data from the 3 pieces of high-frequency text data as target text data, formats the target text data based on a fixed sentence pattern to obtain formatted target text data, converts the formatted target text data into a voice command, transmits the voice command to the vehicle-mounted terminal through voice assistance, plays the voice command through a loudspeaker of the vehicle-mounted terminal, and displays the target text data through a vehicle-mounted display screen, thereby facilitating the viewing of a user.

In the technical scheme shown in fig. 1, according to an in-vehicle scene, determining an optimal frequent set in a pre-built frequent set database according to a recommendation rule corresponding to the in-vehicle scene, and determining a semantic intention corresponding to the optimal frequent set as an optimal semantic intention, wherein the frequent set is an alternative semantic intention larger than a preset support threshold; the method comprises the steps of obtaining target text data through optimal semantic intention and a pre-constructed intention database, converting the target text data into voice instructions, sending the voice instructions to the vehicle-mounted terminal to guide target users, establishing different recommendation methods for different scenes through voice guidance of multiple scenes, and conducting personalized vehicle-mounted function recommendation on the target users, so that vehicle-mounted functions conforming to actual intention of the users are recommended for different use scenes, and the interaction efficiency and the use comfort of the users are effectively improved, and further user experience is improved.

In one embodiment of the application, the process of building an intent database includes: collecting sample voice data in a second preset time period and semantic intention corresponding to the sample voice data, and generating a semantic analysis log; periodically extracting a plurality of text data corresponding to the same semantic intention in a semantic analysis log at intervals of a third preset time period, counting the use times of all the text data, sequencing the use times of the text data according to the sequence from high to low to obtain a text data sequence, taking the text data with the preset quantity in the text data sequence as high-frequency text data, and storing the high-frequency text data into an intention database. Through collecting the voice interaction text data of successful semantic analysis of the user and the vehicle-mounted voice assistant and sequencing, the use habit and the voice interaction mode of the user can be comprehensively and systematically known, so that the constructed intention database is more in accordance with the use habit of the user, and finally, the obtained intention is in accordance with the actual intention of the user in the follow-up optimal intention recommendation.

As one example, the first preset number of text data is the first 3 items of text data. Analyzing and sorting the semantic understanding analysis log, wherein the sorting process comprises the following steps: the text data corresponding to the same semantic intention is counted and analyzed, the use times of each text data are calculated, the use times of a plurality of text data possibly used by a user under the same semantic intention are arranged in the order from high to low, the first 3 items of text data are stored in the intention database as high-frequency text data, and in order to ensure the accuracy and the real-time of intention recognition, the intention database periodically updates the high-frequency text data. The intent database is updated at intervals of a third preset time period, for example, a fixed update period is set, such as weekly, monthly or quarterly, and the specific update period is adjusted according to the actual application scene and the user requirements. Through collecting the voice interaction text data of successful semantic analysis of the user and the vehicle-mounted voice assistant and sequencing, the use habit and the voice interaction mode of the user can be comprehensively and systematically known, so that the constructed intention database is more accordant with the use habit of the user, and finally the obtained intention is accordant with the actual intention of the user in the follow-up optimal intention recommendation, and the user can conveniently use the high-frequency user text in the voice guiding process by screening the high-frequency text data, so that the universality of the guiding text is ensured.

As a specific example, a plurality of text data corresponding to the same semantic intention, for example, the same semantic intention is "air conditioner on", is extracted, and the plurality of text data corresponding to the semantic intention includes: the method comprises the steps of (1) turning on an air conditioner in text data, (2) turning on the air conditioner in text data, (3) turning on the air conditioner in text data, (4) heating, turning on the air conditioner in text data, (3) counting the use times of each text data, and sequencing all the text data from high to low according to the use times to obtain a text data sequence: { text data 2, text data 3, text data 1, text data 4}, the first three items of text data in the text data, namely, text data 2 "air conditioner on", text data 3 "air conditioner on", and text data 1 "air conditioner on", are taken as high-frequency text data, and stored in the intention database. Through collecting the voice interaction text data of successful semantic analysis of the user and the vehicle-mounted voice assistant, and sequencing, the use habit and the voice interaction mode of the user can be comprehensively and systematically known, so that the constructed intention database accords with the use habit of the user, and the user can conveniently use the high-frequency user text in the voice guiding process by screening the high-frequency text data, so that the universality of the guiding text is ensured, and finally the obtained intention accords with the actual intention of the user in the follow-up optimal intention recommendation, and further the user experience is improved.

In one embodiment of the application, the process of building the intent database further comprises: collecting sample voice data in a second preset time period and semantic intention corresponding to the sample voice data, and generating a semantic analysis log; periodically extracting a plurality of text data corresponding to the same semantic intention in a semantic analysis log at intervals of a third preset time period, counting the using times of all the text data, taking the text data with the using times greater than a second preset time threshold value as high-frequency text data, and storing the high-frequency text data into an intention database. Through collecting the voice interaction text data of successful semantic analysis of the user and the vehicle-mounted voice assistant and sequencing, the use habit and the voice interaction mode of the user can be comprehensively and systematically known, so that the constructed intention database is more in accordance with the use habit of the user, and finally, the obtained intention is in accordance with the actual intention of the user in the follow-up optimal intention recommendation.

As an example, the second preset number of times threshold is 10 times, the semantic understanding analysis log is analyzed and sorted, and the sorting process further includes: and sorting a plurality of text data which can be used by a user under the same semantic intention, and storing the text data which is used more than 10 times in an intention database. Through collecting the voice interaction text data of successful semantic analysis of the user and the vehicle-mounted voice assistant, the use habit and the voice interaction mode of the user can be comprehensively and systematically known, so that the constructed intention database is more in accordance with the use habit of the user, and finally the obtained intention is in accordance with the actual intention of the user in the follow-up optimal intention recommendation.

As an example, the second preset time period is one month, and the sample voice data is voice interaction text data of which the semantic analysis of the user and the vehicle-mounted voice assistant is successful. And collecting voice interaction text data which are successfully parsed by the semantics of the user and the vehicle-mounted voice assistant within one month as historical data, and arranging the voice interaction text data in the historical data in a reverse order according to the use sequence to obtain a semantic understanding parsing log. Extracting semantic intentions corresponding to the sample voice data from the semantic understanding analysis log, wherein the semantic intentions specifically comprise: as a specific example, a parameter rule is first set, where the parameter rule includes a semantic intention field rule, a destination rule, and an operation rule, table 1 is a semantic intention field rule parameter table, as shown in table 1, the semantic intention field includes a field value and a field description, table 2 is a destination rule parameter table, as shown in table 2, includes a destination value and a destination description, table 3 is an operation rule parameter table, as shown in table 3, includes an operation name, an operation type, an operation description, and a must-fill item, and an intention database is obtained based on the parameter rules and the semantic understanding analysis log in tables 1,2, and 3, and table 4 is an example table of the intention database, as shown in table 4, where the example table of the intention database includes semantic intention, high-frequency text data, a field value (Domain), a destination value (intel), a name (name), and an operation name (operands). Through collecting the voice interaction text data of successful semantic analysis of the user and the vehicle-mounted voice assistant and sequencing, the use habit and the voice interaction mode of the user can be comprehensively and systematically known, so that the constructed intention database is more in accordance with the use habit of the user, and finally, the obtained intention is in accordance with the actual intention of the user in the follow-up optimal intention recommendation.

TABLE 1 semantic intent domain rule parameter Table

TABLE 2 destination rule parameter Table

TABLE 3 operation rule parameter Table

Table 4-intent database example table

As a specific example, when it is determined that the optimal semantic intention is "SMARTVEHICLE-StatesControl-airConditioning-open, an air conditioner is turned on", corresponding high-frequency text data including text data 2 "air conditioner on", text data 3 "air conditioner on" and text data 1 "air conditioner on" are found in table 4 based on the optimal semantic intention, 1 high-frequency text data is randomly selected from the 3 high-frequency text data as target text data, for example, text data 1 "air conditioner on" is selected as target text data, then the text data 1 "air conditioner on" is formatted based on a fixed sentence pattern, the formatted target text data is obtained, the formatted target text data is converted into a voice command, the voice command is sent to the vehicle-mounted terminal through voice assistance, the voice command is played through a speaker of the vehicle-mounted terminal, and the target text data corresponding to the voice command is displayed through a vehicle-mounted display screen, so that the user can conveniently view, the user can conveniently use the high-frequency user text in the voice guiding process by screening the high-frequency text data, so that universality of the guiding text is ensured, the final recommended semantic intention is more in line with the user's intention, and the user's experience is further improved.

In one embodiment of the application, the process of building the frequent set database includes: acquiring voice data to be recognized, converting the voice data to be recognized into text data, and performing semantic intention recognition on the text data to obtain a plurality of alternative semantic intention; calculating the support degree of each alternative semantic intention, classifying the alternative semantic intention which is larger than or equal to a preset support degree threshold value into a single frequent set, storing all the single frequent sets into a frequent set database, and classifying the alternative semantic intention which is smaller than the preset support degree threshold value into a non-frequent set; initializing a null matrix, wherein matrix behaviors of the null matrix are single frequent sets, a matrix array of the null matrix is also single frequent set, and combining matrix rows with the single frequent sets in the matrix array to obtain a plurality of combined frequent sets; counting the co-occurrence times of the combined frequent sets, and storing the combined frequent sets which are larger than a preset co-occurrence time threshold value in a frequent set database, wherein the co-occurrence times of the combined frequent sets comprise the co-occurrence times of all the frequent sets in the combined frequent sets. The co-occurrence matrix is utilized to further determine a combination frequent set, the combination with a strong association relation is accurately extracted, powerful support is provided for subsequent recommendation, the frequent set is screened out according to the degree of support by accurately calculating the degree of support of each intention, the non-frequent set is removed, the effectiveness and pertinence of data are ensured, a comprehensive and efficient frequent set database is constructed, a single frequent set and a combination frequent set are covered, and abundant and various choices are provided for users.

As an example, the preset support threshold is 0.6, the preset co-occurrence number threshold is 3, and the support of the alternative semantic intent, that is, the occurrence frequency of the alternative semantic intent. Collecting voice interaction text data which are successfully parsed by the semantics of a user and a vehicle-mounted voice assistant within one month as historical data, and arranging the voice interaction text data in the historical data in a reverse order according to the use sequence to obtain a semantic understanding analysis log, wherein the semantic understanding analysis log has a plurality of alternative semantic intentions, or the voice data to be recognized is subjected to alternative semantic intent recognition by acquiring the voice data to be recognized to obtain a plurality of alternative semantic intentions. The support degree of all the alternative semantic intentions, namely the occurrence frequency of the alternative semantic intentions, is calculated, as shown in table 5, table 5 is an example table of user names and the alternative semantic intentions, and the support degree of each alternative semantic intention in table 5 is calculated. The alternative semantic intention S1 appears in the user name U1 and the user name U2, the support degree is 2/3, the alternative semantic intention S2 appears in the user name U1, the user name U2 and the user name U3, and the support degree is 3/3. The alternative semantic intention S3 appears in the user name U1, the user name U2 and the user name U3, the support degree is 3/3, the alternative semantic intention S4 appears in the user name U1, the support degree is 1/3, the alternative semantic intention S5 appears in the user name U1, the user name U2 and the user name U3, the support degree is 3/3, the alternative semantic intention S6 appears in the user name U2, the support degree is 1/3, the alternative semantic intention S7 appears in the user name U3, the support degree is 1/3, the alternative semantic intention S8 appears in the user name U3 and the support degree is 1/3. Statistics show that the alternative semantic intentions corresponding to the support degree greater than or equal to the preset support degree threshold value of 0.6 include: alternative semantic intent S1, alternative semantic intent S2, alternative semantic intent S3, and alternative semantic intent S5. Classifying alternative semantic intentions greater than or equal to a preset support threshold value of 0.6 into a single frequent set, namely, classifying alternative semantic intentions smaller than the preset support threshold value of 0.6 into non-frequent sets, namely, classifying single frequent set sequences as { S1, S2, S3 and S5}, classifying non-frequent set sequences as { S4, S6, S7 and S8}, and deleting the non-frequent sets. To avoid data duplication, the single frequent set sequence in this embodiment is denoted as { s1, s2, s3, s5}, and the non-frequent set sequence is denoted as { s4, s6, s7, s8}. Based on the determined single frequent set, co-occurrence situations among a plurality of single frequent sets are calculated by constructing a co-occurrence matrix, wherein each cell represents the number of times that two intents co-occur in the same user behavior, and the co-occurrence matrix is obtained.

As a specific example, the process of obtaining the co-occurrence matrix specifically includes: a null matrix is first initialized, with rows and columns being a single frequent set sequence s1, s2, s3, s 5. The empty matrix is filled according to the semantic intent of each user. The method specifically comprises the following steps: for U1: s1, s2, s3, s4 and s5, because the single frequent set s4 is a non-frequent set, the non-frequent set s4 is removed, the simultaneous occurrence times of any two single frequent sets in U1 are counted, namely, the single frequent set s1 and the single frequent set s2 co-occur 1 time, the single frequent set s1 and the single frequent set s3 co-occur 1 time, the single frequent set s1 and the single frequent set s5 co-occur 1 time, the single frequent set s2 and the single frequent set s3 co-occur 1 time, the single frequent set s2 and the single frequent set s5 co-occur 1 time, and the single frequent set s3 and the single frequent set s5 co-occur 1 time; for U2: s1, s3, s2, s6 and s5, because s6 is an infrequent set, the infrequent set s6 is eliminated, the simultaneous occurrence times of any two single frequent sets in U2 are counted, the single frequent set s1 and the single frequent set s2 co-occur 1 time, the single frequent set s1 and the single frequent set s3 co-occur 1 time, the single frequent set s1 and the single frequent set s5 co-occur 1 time, the single frequent set s2 and the single frequent set s3 co-occur 1 time, the single frequent set s2 and the single frequent set s5 co-occur 1 time, and the single frequent set s3 and the single frequent set s5 co-occur 1 time; for U3: s5, s7, s3, s2, s8, because s7 and s8 are non-frequent sets, the non-frequent set s7 and the non-frequent set s8 are eliminated, the single frequent set s2 and the single frequent set s3 co-occur 1 time, the single frequent set s2 and the single frequent set s5 co-occur 1 time, the single frequent set s3 and the single frequent set s5 co-occur 1 time, and the co-occurrence times of the same two single frequent sets are added up on the basis of the statistics above to obtain a final co-occurrence matrix:

| s1 | s2 | s3 | s5 |

--------------------------

s1 |0 |2 |2 |2 |

--------------------------

s2 |2 |0 |3 |3 |

--------------------------

s2 |2 |3 |0 |3 |

--------------------------

s5 |2 |3 |3 |0 |

Based on the co-occurrence matrix, two single frequent sets which are larger than or equal to a preset co-occurrence frequency threshold value '3' are used as combined frequent sets and stored in a frequent set database, according to the co-occurrence matrix, the combined frequent sets which are larger than or equal to the preset co-occurrence frequency threshold value '3' are obtained to be { (s 2, s 5), (s 2, s 3), (s 3, s 5) }, and the three combined frequent sets are stored in the frequent set database. And further determining a combined frequent set by utilizing the co-occurrence matrix, accurately extracting the combined frequent set with a strong association relation, providing powerful support for subsequent recommendation, precisely calculating the support degree of each intention, screening out the frequent set according to the support degree, eliminating non-frequent sets, and ensuring the effectiveness and pertinence of data. And by constructing a comprehensive and efficient frequent set database, a single frequent set and a combined frequent set are covered, so that abundant and various choices are provided for users.

Table 5-example table of usernames and alternative semantic intent

In the above example, a single frequent set sequence { s1, s2, s3, s5} and a combined frequent set sequence { (s 2, s 5), (s 2, s 3), (s 3, s 5) } are stored in the frequent set database, as shown in table 6, table 6 is an example table of the frequent database including frequent sets, types, supporters, and co-occurrence times. To ensure accuracy and efficiency of the recommendation library, the data is updated periodically. A fixed update period is set, such as weekly, monthly or quarterly, and the specific update period is adjusted according to the actual application scenario and the user requirements.

Table 6-example table of frequent database

In one embodiment of the present application, determining the best frequent set that meets the actual intent of the user includes: determining an in-vehicle scene with the use times smaller than a first preset times threshold as a low-frequency scene, and determining an in-vehicle scene with the use times larger than or equal to the first preset times threshold as a high-frequency scene; and determining a recommended scene based on the low-frequency scene and the high-frequency scene, and determining an optimal frequent set which accords with the actual intention of the user in a pre-constructed frequent set database based on the recommended scene, wherein the recommended scene comprises a first recommended scene, a second recommended scene and a third recommended scene. By distinguishing different in-vehicle scenes, different recommendation methods are established for different scenes, and personalized vehicle-mounted function recommendation is carried out on a target user, so that vehicle-mounted functions which accord with actual intention of the user are recommended for different use scenes, interaction efficiency and use comfort of the user are effectively improved, and user experience is further improved.

In one embodiment of the present application, determining a recommended scene based on a low frequency scene and a high frequency scene, and determining an optimal frequent set according with actual intention of a user in a pre-built frequent set database based on the recommended scene, includes: when the in-vehicle scene is identified as a low-frequency scene and the identification result of the voice to be identified is invalid voice data, determining the in-vehicle scene as a first recommended scene; or when the in-vehicle scene is identified as a low-frequency scene and voice input is not identified, determining the in-vehicle scene as a first recommended scene; under a first recommendation scene, combining historical data, searching a combined frequent set in a frequent set database according to a first recommendation rule to serve as an optimal frequent set, wherein the historical data comprises a use record of the frequent set; the combined frequent set is comprised of at least two frequent sets. In low frequency scenarios, the user's voice interaction behavior tends to be more random and uncertain. A single frequent set may not accurately capture these unstable patterns of user behavior. And the combination frequent item set can comprehensively consider a plurality of factors and reflect the actual demands and habits of the user more comprehensively, so that the vehicle-mounted function which accords with the actual intention of the user is recommended, the interaction efficiency and the use comfort of the user are effectively improved, and the user experience is further improved.

As one example, the first preset number of times threshold is 5 times, an in-vehicle scene whose number of times of use is less than the first preset number of times threshold of "5 times" is determined as a low-frequency scene, and an in-vehicle scene whose number of times of use is greater than or equal to the first preset number of times threshold of "5 times" is determined as a high-frequency scene; after waking up the voice guidance function, if the in-vehicle scene is identified as a low frequency scene and voice data is not identified, determining the current in-vehicle scene as a first recommended scene, or if the in-vehicle scene is identified as a low frequency scene and voice recognition result is invalid voice data, for example: and if the invalid voice data is 'haha', 'hey', '123', and the like, determining the current in-vehicle scene as a first recommended scene. Under a first recommendation scene, combining historical data, searching a combined frequent set in a frequent set database according to a first recommendation rule to serve as an optimal frequent set, wherein the historical data comprises a use record of the frequent set; the combined frequent set is comprised of at least two frequent sets. As another example, when the in-vehicle scene is recognized as a homepage and the recognition result of the voice to be recognized is invalid voice data, determining the in-vehicle scene as a first recommended scene; or when the in-vehicle scene is identified as the homepage and the voice input is not identified, determining the in-vehicle scene as the first recommended scene. Under homepage and low frequency scene, user's pronunciation interactive behavior tends to be more random and uncertain, and single frequent collection probably can not accurately catch these unstable user behavior patterns, and the combination frequent item collection then can take into account a plurality of factors comprehensively, reflects user's actual demand and custom more comprehensively to the on-vehicle function that accords with user's actual intention is recommended, effectual improvement user's interaction efficiency and the use comfort level, and then improves user experience.

In one embodiment of the present application, determining a recommended scene based on a low frequency scene and a high frequency scene, and determining an optimal frequent set according with actual intention of a user in a pre-built frequent set database based on the recommended scene, includes: when the in-vehicle scene is identified as a high-frequency scene and the identification result of the voice to be identified is invalid voice data, determining the in-vehicle scene as a second recommended scene; or when the in-vehicle scene is identified as a high-frequency scene and the voice input is not identified, determining the in-vehicle scene as a second recommended scene; and under a second recommendation scene, combining the historical data, and determining a single frequent set as an optimal frequent set in a frequent set database according to a second preset recommendation rule. Under a high-frequency scene, the voice interaction behavior of the user is often clear, a single frequent set can capture a stable user behavior mode more accurately, the actual demand and habit of the user are reflected more accurately, the vehicle-mounted function conforming to the actual intention of the user is recommended, the interaction efficiency and the use comfort of the user are effectively improved, and the user experience is further improved.

As one example, the first preset number of times threshold is 5 times, an in-vehicle scene whose number of times of use is less than the first preset number of times threshold of "5 times" is determined as a low-frequency scene, and an in-vehicle scene whose number of times of use is greater than or equal to the first preset number of times threshold of "5 times" is determined as a high-frequency scene; after waking up the voice guidance function, if the in-vehicle scene is identified as a high-frequency scene and voice data is not identified, determining the current in-vehicle scene as a second recommended scene, or if the in-vehicle scene is identified as a high-frequency scene and voice recognition result is invalid voice data, for example: and if the invalid voice data is 'haha', 'hey', '123', and the like, determining the current in-vehicle scene as a second recommended scene. And under a second recommendation scene, combining the historical data, and determining a single frequent set as an optimal frequent set in a frequent set database according to a second preset recommendation rule. Under a high-frequency scene, the voice interaction behavior of the user is often clear, a single frequent set can capture a stable user behavior mode more accurately, the actual demand and habit of the user are reflected more accurately, the vehicle-mounted function conforming to the actual intention of the user is recommended, the interaction efficiency and the use comfort of the user are effectively improved, and the user experience is further improved.

In one embodiment of the present application, determining a recommended scene based on a low frequency scene and a high frequency scene, and determining an optimal frequent set according with actual intention of a user in a pre-built frequent set database based on the recommended scene, includes: after semantic recognition is carried out on the voice data to be recognized to obtain initial semantic intention, if voice input is not recognized in a first preset time period, determining that the in-vehicle scene is a third recommended scene; and under a third recommendation scene, determining a combined frequent set or a single frequent set as the best frequent set in a frequent set database according to a third preset recommendation rule based on the initial semantic intention and the historical data. The third recommendation scene is determined by identifying the voice data of the user, vehicle-mounted function recommendation is carried out on the current scene, and personalized vehicle-mounted function recommendation is carried out on the target user, so that the actual demand and habit of the user are reflected more accurately, the vehicle-mounted function which accords with the actual intention of the user is recommended, the interaction efficiency and the use comfort of the user are effectively improved, and the user experience is further improved.

As an example, the first preset time period is "5 seconds", semantic recognition is performed on the voice data to be recognized, after the initial semantic intention is obtained, a timer is set, timing is started from the initial semantic intention is successfully analyzed, and if no voice input is recognized within the first preset time period of "5 seconds", the in-vehicle scene is determined to be a third recommended scene. And under a third recommendation scene, determining a combined frequent set or a single frequent set as the best frequent set in a frequent set database according to a third preset recommendation rule based on the initial semantic intention and the historical data. The third recommendation scene is determined by identifying the voice data of the user, vehicle-mounted function recommendation is carried out on the current scene, and personalized vehicle-mounted function recommendation is carried out on the target user, so that the actual demand and habit of the user are reflected more accurately, the vehicle-mounted function which accords with the actual intention of the user is recommended, the interaction efficiency and the use comfort of the user are effectively improved, and the user experience is further improved.

In one embodiment of the present application, determining the best frequent set according to a first recommendation rule includes: counting the co-occurrence times of the combined frequent sets in the frequent set database to obtain a first statistical result, and determining the combined frequent set with the highest co-occurrence times in the first statistical result as the optimal frequent set, wherein the co-occurrence times of the combined frequent set comprise the co-occurrence times of all the frequent sets in the combined frequent set; when a plurality of combined frequent sets with the same co-occurrence frequency exist in the first statistical result, counting the use frequency of the combined frequent sets with the same co-occurrence frequency in the historical data to obtain a second statistical result, and determining the combined frequent set with the highest use frequency in the second statistical result as an optimal frequent set; if the frequency of the co-occurrence is the same, the optimal frequency set is randomly determined in the frequency of the co-occurrence, if the frequency of the use in the historical data is the same. Under a low-frequency scene, the voice interaction behaviors of the user are more random and uncertain, a single frequent set can not accurately capture the unstable user behavior modes, a plurality of factors can be comprehensively considered by combining frequent item sets, the actual demands and habits of the user are more comprehensively reflected, an optimal frequent set is determined by combining the frequent sets, the intention instruction which is most in line with the demands of the user is accurately selected in a database based on the optimal frequent set, and the intention instruction is transmitted to the user in a voice and text guiding mode, so that personalized and intelligent service experience is realized.

As an example, when the current in-vehicle scene is identified as the first recommended scene, the co-occurrence times of all the combined frequent sets in the frequent set database are counted, that is, the combined frequent sets are sorted according to the order of the co-occurrence times of the combined frequent sets from high to low, a first statistical result is obtained, the first statistical result is analyzed, the combined frequent set with the highest co-occurrence times in the first statistical result is used as the best frequent set, if a plurality of combined frequent sets with the same co-occurrence times in the first statistical result are analyzed, the times of occurrence of the combined frequent sets with the same co-occurrence times in the historical data are counted, a second statistical result is obtained, and the combined frequent set with the highest use times in the second statistical result is determined as the best frequent set, for example: extracting a combination frequent set corresponding to a user name U2, counting to obtain 3 combination frequent sets with the same co-occurrence frequency, namely, the combination frequent sets { s2, s5} co-occur for 1 time, the combination frequent sets { s2, s3} co-occur for 1 time, the combination frequent sets { s3, s5} co-occur for 1 time, counting to obtain the same use frequency of each combination frequent set in the historical data by combining the historical data, randomly recommending the combination frequent sets { s3, s5}, and determining the combination frequent sets { s3, s5} as the optimal frequent set if the statistics to obtain the combination frequent sets { s3, s5} with the highest use frequency in the historical data. Under a low-frequency scene, the voice interaction behaviors of the user are more random and uncertain, a single frequent set can not accurately capture the unstable user behavior modes, a plurality of factors can be comprehensively considered by combining frequent item sets, the actual demands and habits of the user are more comprehensively reflected, an optimal frequent set is determined by combining the frequent sets, the intention instruction which is most in line with the demands of the user is accurately selected in a database based on the optimal frequent set, and the intention instruction is transmitted to the user in a voice and text guiding mode, so that personalized and intelligent service experience is realized.

In one embodiment of the present application, determining the best single frequent set in the frequent set database according to the second preset recommendation rule includes: matching the use field corresponding to the second recommended scene based on the mapping relation between the scene in the vehicle and the semantic intention field, and marking the use field as a target use field; counting the support degree of all single frequent sets in the target use field, obtaining a third statistical result, and determining the single frequent set with the highest support degree in the third statistical result as the optimal frequent set, wherein the support degree of the single frequent set comprises the occurrence frequency of the single frequent set; if the third statistical result has a plurality of single frequent sets with the same support degree, counting the use times of the single frequent sets with the same support degree in the historical data to obtain a fourth statistical result, and determining the single frequent set with the highest use times in the fourth statistical result as the optimal frequent set; if the single frequent sets with the same support degree have the same use times in the historical data, the optimal frequent set is randomly determined in the single frequent sets with the same support degree. Under a high-frequency scene, the voice interaction behavior of the user is often clear, a single frequent set can capture a stable user behavior mode more accurately, the actual demand and habit of the user are reflected more accurately, the optimal frequent set is determined through the single frequent set, the voice instruction which best meets the user demand is selected accurately in a database based on the optimal frequent set, and is transmitted to the user in a voice and text guiding mode, so that the vehicle-mounted function which meets the actual intention of the user is recommended, the interaction efficiency and the use comfort of the user are effectively improved, and personalized and intelligent service experience is realized.

As an example, a mapping relationship between an in-vehicle scene and a semantic intent domain is established, so that each in-vehicle scene can be ensured to be accurately matched with a corresponding domain, and for the in-vehicle scene, the corresponding semantic intent domain is defined, for example, the in-vehicle scene is: air conditioning page, setting page, music interface, navigation page, semantic intention field is: SMARTVEHICLE, VEHICLECONTROL, MUSIC, MAP, so that a mapping relation between the scene in the vehicle and the semantic intention field is established as follows: the semantic intention field corresponding to the air-conditioning page is SMARTVEHICLE, the semantic intention field corresponding to the page is VehicleControl, the semantic intention field corresponding to the MUSIC interface is MUSIC and the semantic intention field corresponding to the navigation page is MAP.

As an example, the target usage Domain SMARTVEHICLE is to obtain, when the current in-vehicle scene is identified as the second recommended scene, for example, the second recommended scene is an air-conditioning page, the semantic intention Domain corresponding to the air-conditioning page is SMARTVEHICLE according to the mapping relationship between the in-vehicle scene and the semantic intention Domain, and perform associated recommendation on a single frequent set in SMARTVEHICLE, where the semantic intention Domain is obtained through a Domain in semantic intention (Domain) field. Counting the support degree of the single frequent set in SMARTVEHICLE, and determining the single frequent set with the highest support degree in the third counting result as the best frequent set, for example: and (3) determining s2 as the best frequent set if the single frequent set with the highest support degree is s 2. If the third statistical result has a plurality of single frequent sets with the same support degree, counting the use times of the single frequent sets with the same support degree in the historical data to obtain a fourth statistical result, and determining the single frequent set with the highest use times in the fourth statistical result as the optimal frequent set; if the single frequent sets with the same support degree have the same use times in the historical data, the optimal frequent set is randomly determined in the single frequent sets with the same support degree. Under a high-frequency scene, the voice interaction behavior of the user is often clear, a single frequent set can capture a stable user behavior mode more accurately, the actual demand and habit of the user are reflected more accurately, the optimal frequent set is determined through the single frequent set, the voice instruction which best meets the user demand is selected accurately in a database based on the optimal frequent set, and is transmitted to the user in a voice and text guiding mode, so that the vehicle-mounted function which meets the actual intention of the user is recommended, the interaction efficiency and the use comfort of the user are effectively improved, and personalized and intelligent service experience is realized.

In one embodiment of the present application, determining the best frequent set in the frequent set database according to a third preset recommendation rule based on the initial semantic intent comprises: determining a frequent set corresponding to the initial semantic intention as a special frequent set, and extracting a single frequent set associated with the special frequent set from the historical data as an associated frequent set; calculating the probability of occurrence of a special frequent set in the historical data, and marking the probability as a first probability; calculating the probability of simultaneous occurrence of the special frequent set and each associated frequent set, and marking the probability as a second probability to obtain a plurality of second probabilities; determining the ratio of the first probability to the second probability as the confidence coefficient to obtain a plurality of confidence coefficients; and determining the associated frequent set with the highest confidence as the optimal frequent set. Under a third recommendation scene, the association frequent set with highest confidence is determined to be the optimal frequent set, so that actual demands and habits of users are reflected more accurately, voice instructions which are most in line with the demands of the users are selected in a database based on the optimal frequent set accurately, and are transmitted to the users in a voice and text guiding mode, and personalized and intelligent service experience is achieved.

As an example, the special frequent set is s1, the associated frequent set is s1, s2, s3, s4, s5, s1, s2, and the non-frequent set s4 is removed from the associated frequent set, so as to obtain a removed associated frequent set: s1, s2, s3, s5, s1, s2. In the history data, firstly, calculating the occurrence times of the special frequent set s1 as count_s1, the occurrence times of s1 and s2 as count_s1_and_s2, the occurrence times of s1 and s3 as count_s1_and_s3, the occurrence times of s1 and s5 as count_s1_and_s5, calculating the occurrence probability of the special frequent set in the history data, and marking the probability as a first probability, namely, a first probability P_s1=count_s1/len (user_history), and len (user_history) is the total occurrence times of all the frequent sets in the history data; calculating the probability that a special frequent set and each associated frequent set occur simultaneously, and recording the probability as a second probability to obtain a plurality of second probabilities, namely, a first second probability P_s1_and_s2=count_s1_and_s2/len (user_history), a second probability P_s1_and_s3=count_s1_and_s3/len (user_history), and a third second probability P_s1_and_s5=count_s1_and_s5/len (user_history); determining the ratio of the first probability to the second probability as the confidence coefficient to obtain a plurality of confidence coefficients; the associated frequent set with the highest confidence is determined as the best frequent set, for example, the confidence (s2|s1), the confidence (s3|s1) and the confidence (s5|s1), the confidence (s2|s1) is calculated to be 1, the confidence (s3|s1) is 0.33 and the confidence (s5|s1) is 0.33, that is, under the condition that the confidence (s3|s1) =the confidence_s3_given_s1=p_s1_and_s3/p_s1), the confidence (s5|s1) =the confidence_s5_givens1=p_s1 and the confidence (s5|s1) is 0.33, that is, the probability that the occurrence of the s2 is 100% and the confidence (s5|s1) is 100% and the probability that the occurrence of the confidence (s5|s1) is the highest is the most frequently set is determined as the best frequent set. Under a third recommendation scene, the association frequent set with highest confidence is determined to be the optimal frequent set, so that actual demands and habits of users are reflected more accurately, voice instructions which are most in line with the demands of the users are selected in a database based on the optimal frequent set accurately, and are transmitted to the users in a voice and text guiding mode, and personalized and intelligent service experience is achieved.

In the above embodiment, after the association frequent set s2 with the highest confidence is determined as the best frequent set, corresponding text data is found in the intent database according to the semantic intent corresponding to the best frequent set, so as to obtain high-frequency text data in the intent database, for example, the selected high-frequency text data is "open air conditioner", and is formatted in a fixed sentence, for example: single frequent set: "you can try to say+turn on air conditioner", the combination frequent set "you can try to say+turn on air conditioner, turn on to 25 degrees celsius". And calling a TTS (text to speech) service to convert the formatted target text data into a voice instruction. And transmitting the converted voice instruction and the target text data to the terminal. After receiving the voice command and the target text data, the terminal firstly plays the voice command through a loudspeaker to ensure that the user can hear the voice command. Meanwhile, target text data can be synchronously displayed on a central control screen interface of the terminal, so that the user can conveniently check the target text data. After hearing the voice command or seeing the target text data on the central control screen, the user correctly uses the command through voice to finish the corresponding voice interaction operation. The voice and text guiding mode is transmitted to the user, so that personalized and intelligent service experience is realized.

In one embodiment of the application, the operation success rate analysis is performed on the voice guidance history record: and counting the proportion of successful completion of the operation by the user according to the guiding instruction, setting a threshold value to find out the instruction type with lower success rate, and deleting the instruction type from the recommendation library. The average time that the user has completed the operation is analyzed to find out possible operating bottlenecks or unnecessary complicated operations. According to the user operation data and the user feedback, the user text, the speech speed, the intonation and the like of the guiding instruction are adjusted, so that the guiding instruction better accords with the cognitive habit of the user.

Fig. 2 shows a block diagram of the illustrated in-vehicle voice guidance apparatus according to an exemplary embodiment of the present application. Referring to fig. 2, an in-vehicle voice guidance apparatus 200 according to an embodiment of the present application includes: the best frequent set determination module 210 and the voice guidance module 220. The best frequent set determining module 210 is configured to determine, according to a recommendation rule corresponding to an in-vehicle scene after waking up, a best frequent set conforming to an actual intention of a user from a pre-built frequent set database, and determine a semantic intention corresponding to the best frequent set as a best semantic intention, where the frequent set database includes a plurality of frequent sets, the frequent set is an alternative semantic intention greater than a preset support threshold, and the voice guidance module 220 is configured to obtain target text data through the best semantic intention and the pre-built intention database, convert the target text data into a voice instruction, and then send the voice instruction to a vehicle-mounted terminal, so as to guide the target user, where the intention database includes a corresponding relationship between text data and semantic intention, determine, according to the recommendation rule corresponding to the in-vehicle scene, from the pre-built frequent set database, determine the best frequent set as the best semantic intention, obtain target text data through the best semantic intention and the pre-built intention database, and convert the target text data into a voice instruction, and send the voice instruction to the vehicle-mounted terminal, so as to guide the target user, and guide the target text data through the best semantic intention and the pre-built intention database, and establish different guide methods for the voice instruction to the actual intention, thereby conforming to the actual intention of the user, and improving the recommendation efficiency.

In an embodiment of the present application, the vehicle-mounted voice guidance device further includes a scene recognition module 230, where the scene recognition module is configured to determine an in-vehicle scene with a usage frequency smaller than a first preset frequency threshold as a low-frequency scene, and determine an in-vehicle scene with a usage frequency greater than or equal to the first preset frequency threshold as a high-frequency scene; and determining a recommended scene based on the low-frequency scene and the high-frequency scene, and determining an optimal frequent set which accords with the actual intention of the user in a pre-constructed frequent set database based on the recommended scene, wherein the recommended scene comprises a first recommended scene, a second recommended scene and a third recommended scene.

In an embodiment of the present application, the optimal frequent set determining module 210 is configured to determine an in-vehicle scene as a first recommended scene when the in-vehicle scene is identified as a low-frequency scene and the recognition result of the voice to be recognized is invalid voice data; or when the in-vehicle scene is identified as a low-frequency scene and voice input is not identified, determining the in-vehicle scene as a first recommended scene; under a first recommendation scene, combining historical data, searching a combined frequent set in a frequent set database according to a first recommendation rule to serve as an optimal frequent set, wherein the historical data comprises a use record of the frequent set; the combined frequent set is comprised of at least two frequent sets.

In an embodiment of the present application, the optimal frequent set determining module 210 is configured to determine the in-vehicle scene as the second recommended scene when the in-vehicle scene is identified as the high-frequency scene and the recognition result of the voice to be recognized is invalid voice data; or when the in-vehicle scene is identified as a high-frequency scene and the voice input is not identified, determining the in-vehicle scene as a second recommended scene; and under a second recommendation scene, combining the historical data, and determining a single frequent set as an optimal frequent set in a frequent set database according to a second preset recommendation rule.

In an embodiment of the present application, the optimal frequent set determining module 210 is configured to perform semantic recognition on the voice data to be recognized, and determine that the in-vehicle scene is a third recommended scene if no voice input is recognized within a first preset time period after the initial semantic intention is obtained; and under a third recommendation scene, determining a combined frequent set or a single frequent set as the best frequent set in a frequent set database according to a third preset recommendation rule based on the initial semantic intention and the historical data.

In an embodiment of the present application, the optimal frequent set determining module 210 is configured to count the co-occurrence times of the combined frequent sets in the frequent set database, obtain a first statistics result, determine the combined frequent set with the highest co-occurrence times of the first statistics result as the optimal frequent set, where the co-occurrence times of the combined frequent sets include the co-occurrence times of all the frequent sets in the combined frequent set; when a plurality of combined frequent sets with the same co-occurrence frequency exist in the first statistical result, counting the use frequency of the combined frequent sets with the same co-occurrence frequency in the historical data to obtain a second statistical result, and determining the combined frequent set with the highest use frequency in the second statistical result as an optimal frequent set; if the frequency of the co-occurrence is the same, the optimal frequency set is randomly determined in the frequency of the co-occurrence, if the frequency of the use in the historical data is the same.

In an embodiment of the present application, the optimal frequent set determining module 210 matches the usage area corresponding to the second recommended scene based on the mapping relationship between the scene in the vehicle and the semantic intention area, and marks the usage area as the target usage area; counting the support degree of all single frequent sets in the target use field, obtaining a third statistical result, and determining the single frequent set with the highest support degree in the third statistical result as the optimal frequent set, wherein the support degree of the single frequent set comprises the occurrence frequency of the single frequent set; if the third statistical result has a plurality of single frequent sets with the same support degree, counting the use times of the single frequent sets with the same support degree in the historical data to obtain a fourth statistical result, and determining the single frequent set with the highest use times in the fourth statistical result as the optimal frequent set; if the single frequent sets with the same support degree have the same use times in the historical data, the optimal frequent set is randomly determined in the single frequent sets with the same support degree.

In an embodiment of the present application, the best frequent set determination module 210 determines a frequent set corresponding to the initial semantic intent as a special frequent set, and extracts a single frequent set associated with the special frequent set from the historical data as an associated frequent set; calculating the probability of occurrence of a special frequent set in the historical data, and marking the probability as a first probability; calculating the probability of simultaneous occurrence of the special frequent set and each associated frequent set, and marking the probability as a second probability to obtain a plurality of second probabilities; determining the ratio of the first probability to the second probability as the confidence coefficient to obtain a plurality of confidence coefficients; and determining the associated frequent set with the highest confidence as the optimal frequent set.

In an embodiment of the present application, the vehicle-mounted voice guidance device further includes a database construction module 240, configured to obtain voice data to be recognized, convert the voice data to be recognized into text data, and perform semantic intention recognition on the text data to obtain a plurality of candidate semantic intention; calculating the support degree of each alternative semantic intention, classifying the alternative semantic intention which is larger than or equal to a preset support degree threshold value into a single frequent set, storing all the single frequent sets into a frequent set database, and classifying the alternative semantic intention which is smaller than the preset support degree threshold value into a non-frequent set; initializing a null matrix, wherein matrix behaviors of the null matrix are single frequent sets, a matrix array of the null matrix is also single frequent set, and combining matrix rows with the single frequent sets in the matrix array to obtain a plurality of combined frequent sets; counting the co-occurrence times of the combined frequent sets, and storing the combined frequent sets which are larger than a preset co-occurrence time threshold value in a frequent set database, wherein the co-occurrence times of the combined frequent sets comprise the co-occurrence times of all the frequent sets in the combined frequent sets.

In an embodiment of the present application, the database construction module 240 is configured to collect sample voice data in a second preset period of time and semantic intent corresponding to the sample voice data, and generate a semantic analysis log; periodically extracting a plurality of text data corresponding to the same semantic intention in a semantic analysis log at intervals of a third preset time period, counting the use times of all the text data, sequencing the use times of the text data according to the sequence from high to low to obtain a text data sequence, taking the text data with the preset quantity in the text data sequence as high-frequency text data, and storing the high-frequency text data into an intention database.

It should be noted that, the apparatus provided in the foregoing embodiments and the method provided in the foregoing embodiments belong to the same concept, and the specific manner in which each module and unit perform the operation has been described in detail in the method embodiments, which is not repeated herein. In practical application, the device provided in the above embodiment may distribute the functions to different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above, which is not limited herein.

Referring to fig. 3, fig. 3 is a schematic structural view of an in-vehicle terminal according to an exemplary embodiment of the present application. It should be noted that, the in-vehicle terminal 300 shown in fig. 3 is only an example, and should not impose any limitation on the functions and use areas of the embodiments of the present application.

As shown in fig. 3, the in-vehicle terminal 300 includes a processor 301, a memory 302, and a communication bus 303; a communication bus 303 for connecting the processor 301 to the memory connection 302; the processor 301 is configured to execute computer programs stored in the memory 302 to implement the methods of one or more of the embodiments described above.

The application provides a vehicle-mounted terminal, which comprises a processor, a memory, a transceiver and a communication interface, wherein the memory and the communication interface are connected with the processor and the transceiver and are used for completing communication, the memory is used for storing a computer program, the communication interface is used for carrying out communication, and the processor and the transceiver are used for running the computer program so as to enable an electronic device to execute the steps of the method.

In this embodiment, the memory may include a random access memory (Random Access Memory, abbreviated as RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. It is therefore intended that all equivalent modifications and changes made by those skilled in the art without departing from the spirit and technical spirit of the present application shall be covered by the appended claims.

Claims

1. A vehicle-mounted voice guidance method, the method comprising:

According to recommendation rules corresponding to the scene in the vehicle after awakening, determining an optimal frequent set meeting the actual intention of the user from a pre-constructed frequent set database; determining the semantic intention corresponding to the optimal frequent set as the optimal semantic intention, wherein the frequent set database comprises a plurality of frequent sets, and the frequent sets are alternative semantic intention larger than a preset support threshold;

Obtaining target text data through the optimal semantic intention and a pre-constructed intention database, converting the target text data into a voice instruction and then sending the voice instruction to a vehicle-mounted terminal so as to guide a target user, wherein the intention database comprises a corresponding relation between the text data and the semantic intention;

Determining the best frequent set that meets the actual intent of the user includes:

determining an in-vehicle scene with the use times smaller than a first preset times threshold as a low-frequency scene, and determining an in-vehicle scene with the use times larger than or equal to the first preset times threshold as a high-frequency scene;

Determining a recommended scene based on the low-frequency scene and the high-frequency scene, and determining an optimal frequent set conforming to the actual intention of a user in a pre-constructed frequent set database based on the recommended scene, wherein the recommended scene comprises a first recommended scene, a second recommended scene and a third recommended scene;

The first recommended scene is determined based on the semantic recognition result in the low-frequency scene, the second recommended scene is determined based on the semantic recognition result in the high-frequency scene, and the third recommended scene is determined based on the semantic recognition result in the low-frequency scene or the semantic recognition result in the high-frequency scene after the initial semantic intention is obtained.

2. The vehicle-mounted voice guidance method according to claim 1, wherein determining a recommended scene based on the low-frequency scene and the high-frequency scene, and determining an optimal frequent set conforming to actual intention of a user in a pre-constructed frequent set database based on the recommended scene, comprises:

when the in-vehicle scene is identified as a low-frequency scene and the identification result of the voice to be identified is invalid voice data, determining the in-vehicle scene as a first recommended scene; or when the in-vehicle scene is identified as a low-frequency scene and voice input is not identified, determining the in-vehicle scene as a first recommended scene;

under the first recommendation scene, combining historical data, and searching a combined frequent set in a frequent set database according to a first recommendation rule to serve as an optimal frequent set, wherein the historical data comprises a use record of the frequent set; the combined frequent set is comprised of at least two frequent sets.

3. The vehicle-mounted voice guidance method according to claim 1, wherein determining a recommended scene based on the low-frequency scene and the high-frequency scene, and determining an optimal frequent set conforming to actual intention of a user in a pre-constructed frequent set database based on the recommended scene, comprises:

when the in-vehicle scene is identified as a high-frequency scene and the identification result of the voice to be identified is invalid voice data, determining the in-vehicle scene as a second recommended scene; or when the in-vehicle scene is identified as a high-frequency scene and voice input is not identified, determining the in-vehicle scene as a second recommended scene;

and under the second recommendation scene, combining the historical data, and determining a single frequent set as the best frequent set in the frequent set database according to a second preset recommendation rule.

4. The vehicle-mounted voice guidance method according to claim 1, wherein determining a recommended scene based on the low-frequency scene and the high-frequency scene, and determining an optimal frequent set conforming to actual intention of a user in a pre-constructed frequent set database based on the recommended scene, comprises:

after semantic recognition is carried out on voice data to be recognized to obtain initial semantic intention, if voice input is not recognized in a first preset time period, determining that the in-vehicle scene is a third recommended scene;

And under the third recommendation scene, determining a combined frequent set or a single frequent set as the best frequent set in the frequent set database according to a third preset recommendation rule based on the initial semantic intention and the historical data.

5. The vehicle-mounted voice guidance method according to claim 2, wherein determining the best frequent set according to the first recommendation rule comprises:

Counting the co-occurrence times of the combined frequent sets in the frequent set database to obtain a first statistical result, and determining the combined frequent set with the highest co-occurrence times in the first statistical result as the optimal frequent set, wherein the co-occurrence times of the combined frequent set comprise the co-occurrence times of all the frequent sets in the combined frequent set;

When a plurality of combined frequent sets with the same co-occurrence frequency exist in the first statistical result, counting the use frequency of the combined frequent sets with the same co-occurrence frequency in the historical data to obtain a second statistical result, and determining the combined frequent set with the highest use frequency in the second statistical result as an optimal frequent set;

and if the frequency of the co-occurrence times is the same in the historical data, randomly determining the optimal frequency set in the frequency of the co-occurrence times.

6. The vehicle-mounted voice guidance method according to claim 3, wherein determining the best single frequent set in the frequent set database according to a second preset recommendation rule comprises:

matching the use field corresponding to the second recommended scene based on the mapping relation between the scene in the vehicle and the semantic intention field, and marking the use field as a target use field;

counting the support degree of all single frequent sets in the target use field to obtain a third statistical result, and determining the single frequent set with the highest support degree in the third statistical result as the optimal frequent set, wherein the support degree of the single frequent set comprises the frequency of occurrence of the single frequent set;

If a plurality of single frequent sets with the same support degree exist in the third statistical result, counting the use times of the single frequent sets with the same support degree in the historical data to obtain a fourth statistical result, and determining the single frequent set with the highest use times in the fourth statistical result as the optimal frequent set;

And if the single frequent sets with the same support degree have the same use times in the historical data, randomly determining the optimal frequent set in the single frequent sets with the same support degree.

7. The vehicle-mounted voice guidance method according to claim 4, wherein determining the best frequent set in the frequent set database according to a third preset recommendation rule based on the initial semantic intent, comprises:

determining a frequent set corresponding to the initial semantic intention as a special frequent set, and extracting a single frequent set associated with the special frequent set from the historical data as an associated frequent set;

Calculating the probability of the special frequent set in the historical data, and marking the probability as a first probability; calculating the probability of the simultaneous occurrence of the special frequent set and each associated frequent set, and marking the probability as a second probability to obtain a plurality of second probabilities;

determining the ratio of the first probability to the second probability as a confidence coefficient to obtain a plurality of confidence coefficients;

and determining the associated frequent set with the highest confidence as the optimal frequent set.

8. The vehicle-mounted voice guidance method according to claim 1, wherein the process of constructing the frequent set database includes:

Acquiring voice data to be recognized, converting the voice data to be recognized into text data, and performing semantic intention recognition on the text data to obtain a plurality of alternative semantic intention;

Calculating the support degree of each alternative semantic intention, classifying the alternative semantic intention which is larger than or equal to a preset support degree threshold value into a single frequent set, storing all the single frequent sets into a frequent set database, and classifying the alternative semantic intention which is smaller than the preset support degree threshold value into a non-frequent set;

Initializing a blank matrix, wherein matrix behaviors of the blank matrix are the single frequent set, a matrix array of the blank matrix is also the single frequent set, and the matrix rows are combined with the single frequent set in the matrix array to obtain a plurality of combined frequent sets;

And counting the co-occurrence times of the combined frequent sets, and storing the combined frequent sets which are larger than a preset co-occurrence time threshold value in a frequent set database, wherein the co-occurrence times of the combined frequent sets comprise the co-occurrence times of all the frequent sets in the combined frequent sets.

9. The vehicle-mounted voice guidance method according to claim 1, wherein the process of constructing the intention database includes:

Collecting sample voice data in a second preset time period and semantic intention corresponding to the sample voice data, and generating a semantic analysis log;

Periodically extracting a plurality of text data corresponding to the same semantic intention in the semantic analysis log at intervals of a third preset time period, counting the use times of all the text data, sequencing the use times of the text data according to the sequence from high to low to obtain a text data sequence, taking the text data with the preset quantity in the text data sequence as high-frequency text data, and storing the high-frequency text data into an intention database.

10. An in-vehicle voice guidance apparatus, the apparatus comprising:

The optimal frequent set determining module is used for determining an optimal frequent set which accords with the actual intention of a user from a pre-constructed frequent set database according to the recommendation rule corresponding to the scene in the vehicle after awakening; determining the semantic intention corresponding to the optimal frequent set as the optimal semantic intention, wherein the frequent set database comprises a plurality of frequent sets, and the frequent sets are alternative semantic intention larger than a preset support threshold; determining the best frequent set that meets the actual intent of the user includes: determining an in-vehicle scene with the use times smaller than a first preset times threshold as a low-frequency scene, and determining an in-vehicle scene with the use times larger than or equal to the first preset times threshold as a high-frequency scene; determining a recommended scene based on the low-frequency scene and the high-frequency scene, and determining an optimal frequent set conforming to the actual intention of a user in a pre-constructed frequent set database based on the recommended scene, wherein the recommended scene comprises a first recommended scene, a second recommended scene and a third recommended scene; the first recommended scene is determined based on the semantic recognition result in the low-frequency scene, the second recommended scene is determined based on the semantic recognition result in the high-frequency scene, and the third recommended scene is determined based on the semantic recognition result in the low-frequency scene or the semantic recognition result in the high-frequency scene after the initial semantic intention is obtained;

The voice guiding module is used for obtaining target text data through the optimal semantic intention and a pre-constructed intention database, converting the target text data into voice instructions and then sending the voice instructions to the vehicle-mounted terminal so as to guide a target user, wherein the intention database comprises the corresponding relation between the text data and the semantic intention.

11. The vehicle-mounted terminal is characterized by comprising a processor, a memory and a communication bus; the communication bus is used for connecting the processor and the memory; the processor is configured to execute a computer program stored in the memory to implement the vehicle-mounted voice guidance method according to any one of claims 1 to 9.