CN103853437A - Candidate item obtaining method and device - Google Patents
Candidate item obtaining method and device Download PDFInfo
- Publication number
- CN103853437A CN103853437A CN201210497317.1A CN201210497317A CN103853437A CN 103853437 A CN103853437 A CN 103853437A CN 201210497317 A CN201210497317 A CN 201210497317A CN 103853437 A CN103853437 A CN 103853437A
- Authority
- CN
- China
- Prior art keywords
- input
- behavior data
- data
- entry
- screen
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 100
- 238000012937 correction Methods 0.000 claims description 77
- 238000012216 screening Methods 0.000 claims description 38
- 238000007405 data analysis Methods 0.000 claims description 13
- 238000013480 data collection Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000006399 behavior Effects 0.000 description 134
- 230000008569 process Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 9
- 230000002159 abnormal effect Effects 0.000 description 8
- 238000003860 storage Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000008520 organization Effects 0.000 description 6
- 238000013507 mapping Methods 0.000 description 5
- 241000219112 Cucumis Species 0.000 description 4
- 235000015510 Cucumis melo subsp melo Nutrition 0.000 description 4
- FJJCIZWZNKZHII-UHFFFAOYSA-N [4,6-bis(cyanoamino)-1,3,5-triazin-2-yl]cyanamide Chemical compound N#CNC1=NC(NC#N)=NC(NC#N)=N1 FJJCIZWZNKZHII-UHFFFAOYSA-N 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 125000001475 halogen functional group Chemical group 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 238000013316 zoning Methods 0.000 description 3
- 244000289527 Cordyline terminalis Species 0.000 description 2
- 235000009091 Cordyline terminalis Nutrition 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 235000013372 meat Nutrition 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 235000021167 banquet Nutrition 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 235000012149 noodles Nutrition 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000035699 permeability Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a candidate item obtaining method and device. The method specifically comprises the steps of collecting input behavior data of users in certain geographic area; analyzing the collected input behavior data to obtain subarea data of the geographic area; receiving an input sequence of the users in the geographic area; obtaining a candidate item corresponding to the input sequence according to the subarea data of the geographic area. By adopting the candidate item obtaining method and device, the input efficiency of the users can be improved.
Description
Technical Field
The present application relates to the field of text input technologies, and in particular, to a method and an apparatus for obtaining candidate items.
Background
With the popularization and development of computer technology and internet technology, on one hand, an input method for realizing computer character input is deeper and deeper into the life of people, and on the other hand, the intelligent requirement of a user on the input method is higher and higher.
Mobile devices, such as mobile phones, have been developed rapidly in recent years, and the performance of both CPUs and memory devices has been greatly improved. Correspondingly, the applications on the mobile equipment are more and more abundant, and the user interface is more and more friendly.
If the input method installed on the mobile device can provide candidates based on the geographic location, the user input efficiency and the operation experience can be greatly improved. The chinese patent invention with application number 201110256454.1 and entitled "system and method for dynamically adjusting candidate words based on geographical location on portable device" (hereinafter referred to as the technical solution of the prior art) discloses a technical solution for dynamically adjusting candidate words based on geographical location on portable device, and the method flow of the corresponding technical solution may specifically include:
step 1, a position updating module positions the current geographic position of the portable equipment in real time to obtain current geographic position information, and sends the current geographic position information to an engine module;
step 2, the engine module receives the current geographic position information and dynamically downloads candidate word data related to the geographic position corresponding to the current geographic position information from the network service module according to the requirement;
step 3, the engine module stores the downloaded candidate word data related to the geographic position into the dictionary module;
step 4, the input module generates a corresponding input signal according to the input action of the user and sends the input signal to the engine module;
step 5, the engine module receives the input signal, searches and obtains corresponding input candidate word information in the dictionary module, and sends the candidate word information to the candidate word generating module;
and 6, receiving the candidate word information by a candidate word generating module to generate an input candidate item list.
In the technical scheme of the above-mentioned prior art, the map data is a main source of the candidate word data, and the map data usually includes names of service information such as business circles, restaurants, and the like, and can meet the input requirements of the user to a certain extent, but the map data has the following limitations: firstly, the names of the service information covered by the map data are limited, and the input requirements of the names of the non-service information of the user cannot be met, for example, the words of 'south of the Yangtze Style', 'YuanFang', 'Techno' cannot be covered in the map data; secondly, the words used by the map data are usually over formal and cannot meet the spoken input habits of some users, and if some users like to use Xinjiang office to express Xinjiang office and the like, the Xinjiang office does not exist in the map data; thirdly, the map data has an own updating period, if the updating period of the map data is too long, the candidate word data of the prior art scheme cannot be updated for a long time, so that if a restaurant is newly opened in a certain business circle, the candidate word data of the prior art scheme cannot acquire corresponding data of the restaurant in time; in summary, the above prior art solutions cannot intelligently understand the input requirements of the user, and the candidate word that the user wants cannot always appear in the input process, which affects the input efficiency of the user, and this puts a higher requirement on the intelligence of the input method.
In addition, by adopting the technical scheme of the prior art, as long as the current geographic position positioned in real time changes, the network service module needs to communicate with the network service module to download the candidate word data corresponding to the new geographic position, so that frequent and large communication overhead is easily brought between the mobile equipment and the network service module under the condition that the user of the mobile equipment frequently switches the geographic position.
In addition, candidate word data in different geographic positions are accumulated in a dictionary module on the mobile device day by day, and the storage space of the mobile device is easily occupied.
In summary, one of the technical problems that needs to be urgently solved by those skilled in the art is: how the input efficiency of the user can be improved.
Disclosure of Invention
The technical problem to be solved by the application is to provide a method for acquiring candidate items, which can improve the input efficiency of a user.
In order to solve the above problem, the present application discloses a method for obtaining a candidate item, including:
collecting input behavior data of users in a certain geographic area;
analyzing the collected input behavior data to obtain the regional data of the geographic area;
receiving an input sequence of a user in the geographic area;
and acquiring candidate items corresponding to the input sequence according to the regional data of the geographic area.
On the other hand, the application also discloses a device for acquiring candidate items, which comprises:
the data collection unit is used for collecting input behavior data of users in a certain geographic area;
the data analysis unit is used for analyzing the collected input behavior data to obtain the regional data of the geographic area;
the input sequence receiving unit is used for receiving an input sequence of a user in the geographic area; and
and the candidate item acquisition unit is used for acquiring candidate items corresponding to the input sequence according to the regional data of the geographic area.
Compared with the prior art, the method has the following advantages:
according to the geographical region data, candidate items corresponding to the input sequence of the user in the geographical region are obtained.
Firstly, the geographical data are obtained by analyzing input behavior data of users in a geographical area, and the source of the geographical data is not limited to map data, so that the influence of the characteristics of limited words, formal words, long update period and the like of the map data on candidate items can be effectively avoided; more importantly, users in the geographic area are likely to use the input behavior data to express the characteristic information corresponding to the geographic environment in the same geographic environment, and the regional data obtained by analyzing the input behavior data of the users in the geographic area can embody the characteristics of the geographic area; therefore, the method and the device apply the regional data of the geographic region to the acquisition of the candidate items in the text input process of the user in the geographic region, and can acquire the candidate items reflecting the characteristics or the characteristic information in the geographic region, so that the candidate items desired by the user can be acquired in the text input process, and the intelligence of the input method and the input efficiency of the user can be improved;
for example, in the geographic area described by the regional data, when the characteristic data of the user reflects the wrong pronunciation habit in the geographic area, the method and the device can automatically carry out error correction processing on the wrong input sequence to obtain the correct candidate item, so that a series of error correction operations of the user in the corresponding geographic area can be avoided, and the input efficiency is improved.
Secondly, in the prior art, the geographic position corresponds to the candidate word data one by one, on one hand, the storage of the candidate word data also needs to store corresponding geographic position information, and the storage space of the server and the mobile device is easily occupied; on the other hand, the current geographic position is positioned in real time, and once the current geographic position of the mobile device changes, the mobile device needs to communicate with the network service module to download candidate word data corresponding to the new geographic position, which easily brings frequent and large amount of communication overhead between the mobile device and the network service module;
the partitioned area data used in the text input process are specific to the geographic area, and specific geographic position information does not need to be considered in the storage process, so that the storage space of the server and the mobile equipment can be saved; moreover, even if the current geographic position of the mobile device changes, as long as the geographic area where the mobile device is located does not change, the communication with the server to acquire the sub-area data of the geographic area can be avoided, so that the communication overhead of the mobile device and the server can be saved.
Moreover, the geographic area to which the input sequence belongs is determined by the application does not necessarily depend on the positioning of the mobile device for the own geographic position information, that is, even if the mobile device does not have the positioning function of the own geographic position information, such as a GPS, the application can still be implemented smoothly, and therefore the application has good expansibility.
Further, in the existing technical solution, the word stock related to the current geographic location information that has been loaded on the portable device is likely to be loaded half a year ago or more, which easily results in poor timeliness of finally obtaining candidate word information; moreover, even under the condition that the candidate word data related to the geographic position corresponding to the current geographic position information is downloaded from the network service module at the moment, the candidate word data related to the geographic position and stored at the network service module side are often generated in advance, and the timeliness of the finally obtained candidate word information is still easily influenced;
in the present application, the input behavior data of the user in the geographic area according to which the region-dividing data is based may be data collected in real time, where the real-time collection refers to collecting data in the corresponding geographic area after determining the geographic area to which the input sequence belongs, so that the region-dividing data obtained by analyzing the data collected in real time is also real-time, and the candidate items obtained by further obtaining are also real-time; therefore, compared with the prior art, the input behavior data of the user in the geographic area based on the regional data is collected in real time, and timeliness of the candidate items can be improved.
Drawings
Fig. 1 is a flowchart of a candidate item obtaining method 1 according to the present application;
fig. 2 is a flowchart of embodiment 2 of a candidate obtaining method according to the present application;
FIG. 3 is a schematic diagram of an input method system according to the present application;
fig. 4 is a block diagram of an embodiment of an apparatus for obtaining candidates according to the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
The embodiment of the application adopts the regional data to represent all data which can be selected by the user and can be distinguished according to the geographic region attribute; the geographical data can be obtained by analyzing the input behavior data of the users in the geographical area. Here, the users in the geographic area may include stationary users in the geographic area, or users passing by or living in a short time, and the input behavior data of the users in the geographic area affects the regional data of the corresponding geographic area. For example, if a user in a certain geographic area is sensitive to certain characteristic information, it is likely that the characteristic information is expressed using input behavior data; therefore, the subarea data obtained by analyzing the input behavior data of the users in the geographic area can be used as the characteristic data of the corresponding geographic area to be distinguished from the data of other geographic areas.
The method comprises the steps that partition area data in a certain geographic area are adopted to serve users in the geographic area; specifically, candidate items corresponding to the input sequence of the user in the geographic area are obtained according to the regional data of the geographic area.
The geographical data are obtained by analyzing the input behavior data of the users in the geographical area, and the source of the geographical data is not limited to map data, so that the influence of the characteristics of limited words, formal words, long update period and the like of the map data on candidate items can be effectively avoided; more importantly, users in the geographic area are likely to use the input behavior data to express the characteristic information corresponding to the same regional environment in the same regional environment, and the regional data obtained by analyzing the input behavior data of the users in the geographic area can embody the characteristics of the geographic area; therefore, the method and the device apply the regional data of the geographic region to the acquisition of the candidate items in the text input process of the user in the geographic region, and can acquire the candidate items reflecting the characteristics or the characteristic information in the geographic region, so that the candidate items desired by the user can be acquired in the text input process, and the intelligence of the input method and the input efficiency of the user can be improved.
Referring to fig. 1, a flowchart of embodiment 1 of a method for acquiring candidate items in the present application is shown, which may specifically include:
in the embodiment of the present application, the geographic area is mainly used to indicate an area divided for performing user and user input behavior data management. The geographic areas can be divided according to actual needs by those skilled in the art, for example, the geographic areas of the present application can have a hierarchical relationship similar to administrative areas, such as country-province-city-district-county, etc. In addition, a person skilled in the art may set the fine granularity of the geographic area according to actual needs, for example, for a city, the fine granularity of the geographic area may specifically include province, city, district, street, community, even building, and the like, and typical examples of the community specifically include: "five mouths", "east king manor", "west king manor", "gangjing", and the like, and examples of the buildings may include "fox search network building", "same-party building", "purple building", and the like; for rural areas, the fine granularity of the geographic area may specifically include province, city, county, town, village, and the like; the specific division and the specific fine granularity of the geographic area are not limited by the application.
In an embodiment of the application, the step of collecting the input behavior data of the user in the certain geographic area may specifically include:
and step S111, collecting input behavior data of the user successfully registered in the sub-region server or accessing the geographical region where the sub-region server is located, and taking the input behavior data as the input behavior data of the user in the corresponding geographical region.
In the embodiment of the application, the sub-region server corresponds to a geographic region and can be used for managing users in the geographic region and input behavior data of the users. The input behavior data may generally refer to all data related to input behaviors generated by a user in a text input process, and the input behaviors may specifically include: the method comprises the steps of inputting an input sequence, performing screen-up operation, performing backspace operation, deleting operation and the like, wherein the backspace operation, the deleting operation and the like can comprise operation aiming at the input sequence and operation on a displayed entry, and specific input behaviors and input behavior data are not limited in the application.
In practical application, a user can actively initiate registration with the partitioned area server. For example, one example of a registration process may be: when a user starts up and a network is available, a registration request is sent to a nearest partition server to represent the online state of the user; the sub-region server can confirm the registered geographical region where the user is located according to the registration request of the user, and return a corresponding registration success message to the user, wherein the registration success message can usually carry information such as an ID (Identity) of the user in the registered geographical region, a name of the registered geographical region where the user is located, and the like. And a peer-to-peer (peer) relationship is formed between the users who are successfully registered.
In the embodiment of the present application, the user accessing the geographic area where the partitioned area server is located may be actively obtained by the partitioned area server, for example, the partitioned area server may obtain the user in the managed geographic area through port scanning, or the partitioned area server may send an access message to the input method client in the managed geographic area, and if the input method client returns a response to the access message, the corresponding input method client is used as the user accessing the geographic area where the input method client is located.
In short, in the range of users in the geographical area, where the sub-area server is successfully registered or users accessing the geographical area, where the sub-area server is located, are both within the range of users in the geographical area, one of the main differences between the two is that the former is actively initiated by the user, and the latter is actively initiated by the sub-area server.
In an application example of the present application, the workflow of the partitioned area server may specifically include:
step S201, managing users in a geographic area, and updating the states of the users;
the general user status may include both online and offline status.
Step S202, collecting input behavior data of users in a geographic area;
step S203, analyzing the collected input behavior data of the user to obtain regional data.
In a preferred embodiment of the present application, the input behavior data of the successfully registered user may specifically include input behavior data generated by the successfully registered user in all geographic areas, or input behavior data generated by the successfully registered user in the registered geographic area. For example, a user is expected to live on five roads for work in Beijing, and the user is registered in the geographic areas of the Beijing and the five roads; for the five-crossing zoning server, the input behavior data generated by the successfully registered user in all the geographic areas may specifically include the input behavior data generated by the user in the five-crossing, the beijing, or even other geographic areas, and the input behavior data generated by the successfully registered user in the registered geographic area only includes the input behavior data generated by the user in the five-crossing geographic area. Comparing the input behavior data generated by the user who successfully registers in all geographic areas with the input behavior data generated by the user who successfully registers in the registered geographic area, the former is richer, and the latter can reflect the characteristic information in one geographic area better.
102, analyzing the collected input behavior data to obtain regional data of the geographic area;
in the embodiment of the present application, the data of the divided region may be used to represent data which can be selected by a user and can be divided according to the geographic region attribute.
When obtaining the regional data, the granularity may be a vocabulary or a lexicon, or may be a term (especially in a cloud input mode), where the term is not limited to a chinese term, or may be a term mixed with letters and numbers, or may be a term in languages such as english, japanese, korean, german, and the like.
In the embodiment of the application, the input characteristic in one geographic area can be represented by adopting a region input characteristic, the input characteristic can be used for reflecting the association degree of the input behavior data and the corresponding geographic area, and the stronger the association degree of the input behavior data and the corresponding geographic area is, the more the characteristics of the corresponding geographic area can be reflected; the preset region input characteristic condition is a condition corresponding to the region input characteristic, that is, the data meeting the preset region input characteristic condition in the collected input behavior data is also the input behavior data which has higher association degree with the corresponding geographic area and can better embody the region characteristics.
In a preferred embodiment of the present application, the regional input characteristics may be expressed by regional entry input characteristics, and the step of collecting input behavior data of the user in a certain geographic area may specifically include:
screening input behavior data which accord with the input characteristic conditions of preset region entries from the collected input behavior data;
and analyzing the collected input behavior data to obtain the geographical region geographical data, specifically obtaining the geographical region geographical data according to the screened input behavior data.
The method can provide a scheme of the following regional entry input characteristics;
regional entry input characteristic scheme 1,
The input characteristics of the regional vocabulary entry specifically include the number of users; the preset region entry input characteristic conditions may specifically include preset user number conditions; the collected input behavior data may include on-screen entries; in some preferred embodiments, the collected input behavior data may further include a user identification, and a sequence of user inputs corresponding to the on-screen entries;
the step of screening the input behavior data meeting the input characteristic conditions of the preset region vocabulary entries from the collected input behavior data includes:
s301, counting the number of users who input the entry on the screen in the collected input behavior data;
step S302, screening the upper screen entries of which the number of users meets the preset user number condition from the collected input behavior data;
the step of obtaining the geographical region data according to the screened input behavior data may specifically include:
and step S303, taking the screened entry as a candidate item, and establishing a corresponding relation between the candidate item and an input sequence in at least one input mode.
In the embodiment of the present application, the input mode refers to a correspondence between a term of a character and an input sequence, for example, a full spelling input mode, a simple spelling input mode, a handwriting input mode, a stroke input mode, a five-stroke input mode, and the like.
In the regional entry input characteristic scheme 1, the number of users can be used to indicate the number of users inputting a certain entry in a geographic area, and can be used to reflect the use range of the certain entry in the corresponding geographic area; generally, the larger the number of users, the larger the use range of the entry is, the stronger the association degree between the entry and the corresponding geographic area is, so that the preset geographic input characteristic condition can be obtained according to the number of users to screen out the entry which has stronger association degree with the corresponding geographic area and embodies the geographic characteristics.
After screening out such entries, the correspondence between the entry and the input sequence in one or more input modes can be established, so that the entry can be conveniently output in the geographic area no matter what input mode is used by the user.
Of course, the entry on the screen may be collected and the user input sequence corresponding to the entry on the screen may be collected at the same time, and after the entry on the screen is screened out for which the number of users meets the preset user number condition, the corresponding relationship between the entry on the screen and the collected user input sequence is established, and the candidate items are provided to the users in the geographic area accordingly. This is not limited by the present application.
Referring to table 1, an example of collected input behavior data for a geographic region including a user identification, input sequence, and corresponding entry on the screen is shown.
TABLE 1
User ID within a geographic region | Input sequence | Entry on screen |
1 | jianmian | Meet the noodle |
1 | richang | Daily life |
1 | richang | Richang |
2 | richang | Richang |
2 | pengtou | Bumper head |
3 | chigefan | Dining table |
3 | wudaokou | Five-way port |
3 | chengtie | Urban railway |
3 | richang | Richang |
3 | richangcanguan | Niichang restaurant |
Step S301 may count the number of users who input the entry on the screen in the collected input behavior data, for example, table 2 is the statistical data obtained according to table 1, and for clarity, table 2 also lists the input sequence corresponding to the entry on the screen.
Table 2: statistical data of Table 1
Those skilled in the art can set various preset user number conditions according to actual needs, and the specific preset user number conditions are not limited in the present application.
Some examples of applications of the preset number of users condition are given here:
example 1, the preset number of users condition may be that the number of users who input the upper screen entry in the collected input behavior data is greater than a first threshold, where the first threshold may be set by a person skilled in the art according to actual needs, such as 10, 20, or even 200.
Example 2, the preset user number condition may be that the number of users who input the top entry in the collected input behavior data ranks the top N number of users of the top entry in all the collected input behavior data in the whole geographic area, where the ranking is from top to bottom, and N may be set by one skilled in the art according to actual needs, such as 10, 20, or even 200.
Example 3, the preset user number condition may be that the number of users who input the top entry in the collected input behavior data is ranked M top of the number of users of all top entries of the input sequence corresponding to the top entry, where the ranking is from top to bottom, and M may be set by a person skilled in the art according to actual requirements, for example, when M =1, the number of users who input the top entry corresponding to the input sequence is the largest, when M =2, the number of users who input the top entry corresponding to the input sequence is the largest and the second is the largest, and so on. Taking the input behavior data corresponding to "richang" as an example, assuming that the preset user number condition is that the number of users of the entry corresponding to the input sequence is the largest, the finally screened entry is "ri chang".
In addition, it should be noted that, for the upper screen entries obtained during the screening, the corresponding number of users may also be simultaneously stored, and when the corresponding relationship between the candidate items and the obtained input sequence is stored, in this way, after the candidate items corresponding to the input sequence are obtained in the subsequent step 104 according to the regional data of the geographic area, the order of the candidate items in the candidate item list may be adjusted according to the number of users of the obtained candidate items, for example, the candidate items with the larger number of users are arranged behind the candidate items with the smaller number of users, and so on.
Regional entry input characteristic scheme 2,
The input characteristics of the regional vocabulary entry specifically comprise input probability comparison; the preset region entry input characteristic conditions may specifically include preset input probability comparison conditions; (ii) a The collected input behavior data may include on-screen entries; in some preferred embodiments, the collected input behavior data may further include a sequence of user inputs corresponding to the onscreen terms;
the step of screening the input behavior data meeting the preset region entry input characteristic condition from the collected input behavior data may specifically include:
step S301, counting first input probabilities of all the top screen entries in the collected input behavior data of a certain geographic area aiming at the top screen entries in the collected input behavior data of the geographic area;
step S302, counting second input probabilities of all the top screen entries in the collected input behavior data of all the geographic areas aiming at a top screen entry in the collected input behavior data of all the geographic areas;
step S303, taking the ratio of the first input probability and the second input probability of a certain upper screen entry as the input probability comparison of the upper screen entry in the geographic area corresponding to the first input probability;
s304, screening the upper screen entries of which the input probability comparison accords with the preset input probability comparison condition from the collected input behavior data of a certain geographic area;
the step of obtaining the geographical region data according to the screened input behavior data may specifically include:
and S305, taking the screened entry on the screen as a candidate item, and establishing a corresponding relation between the candidate item and an input sequence in at least one input mode.
The regional entry input characteristic scheme 2 compares the data distribution in a certain geographic area with the data distribution in all the geographic areas to obtain input probability comparison, and screens out entries which frequently appear in the geographic area and can embody regional characteristics. In particular, the data distribution may be represented by an input probability.
After screening out such entries, the correspondence between the entry and the input sequence in one or more input modes can be established, so that the entry can be conveniently output in the geographic area no matter what input mode is used by the user.
Of course, the user input sequence corresponding to the upper screen entry may also be collected while the upper screen entry is collected, after the upper screen entry whose input probability comparison meets the preset input probability comparison condition is screened out, the corresponding relationship between the upper screen entry and the collected user input sequence is established, and the candidate item is provided to the user in the geographic area accordingly. This is not limited by the present application.
In an application example of the present application, tables 3, 4, and 5 respectively show a first input probability of a user on-screen entry in the zone of sichuan province, a second input probability of a user on-screen entry in the zone of all provinces, and a corresponding input probability comparison.
TABLE 3
Entry word | Input probability |
What is | 0.00523994235541276176 |
He | 0.00460178861020483054 |
How to | 0.00424680611763221614 |
…… | …… |
Melon doll | 0.00001707681748498965 |
TABLE 4
Entry word | Input probability |
What is | 0.00513994235541276176 |
He | 0.00450178861020483054 |
How to | 0.00442680611763221614 |
…… | …… |
Melon doll | 0.00000055355348498965 |
TABLE 5
Entry word | Input probability comparison |
What is | 1.019455 |
He | 1.022213 |
How to | 0.959339 |
…… | …… |
Melon doll | 30.849444 |
Gua | 73.676168 |
It can be seen that, in the input behavior data of users in the four-river region and users in the whole region, "yaho", "what" and "how" are common entries, and the input probability also approaches; however, in the user group of the common Sichuan chat, the input probabilities of the terms such as "what", "crash", "melon child", "hard to live", "white of day", "bardon" and the like are significantly higher than those of the same term in the whole region; therefore, the entries can be screened out and used as characteristic entries or characteristic data of the Sichuan region.
Regional entry input characteristic scheme 3,
In a preferred embodiment of the present application, the regional input characteristic may be represented by a regional error correction input characteristic, and in this case, the collected input behavior data may specifically include an input sequence, an input operation, and a corresponding entry on the screen;
the step of collecting the input behavior data of the user in the certain geographic area may specifically include:
step S501, screening out input behavior data, of which one or more of input sequences, input operations and corresponding entries on a screen meet preset region error correction input conditions, from the collected input behavior data;
step S502, analyzing the collected input behavior data to obtain the geographical region zoning data, specifically obtaining the geographical region zoning data according to the screened input behavior data.
In practical applications, some regions have wrong pronunciation habits, for example, the local residents in a certain region are not classified into r and l, and for example, the un and ong in a certain region are not classified, or the tongue-curling sound and the non-tongue-curling sound are not classified (zh and z are not classified), and the like; in the normal character input process, a user inputs a coded character string, and selects an entry from candidate items to be displayed on a screen to complete one character input, namely the normal character input usually only comprises one input and one screen display; however, the above-mentioned incorrect pronunciation habit is likely to cause abnormal character input, for example, in example 1 of the present application, a user wants to input "good hot", but the user r and l do not distinguish, first inputs the code character string of "haole" in the edit box of the input sequence, finds that the desired candidate cannot be found, deletes "le" in the edit box, inputs "re", and finally displays "good hot".
Because the input behavior data of the application can include all data related to the input behavior generated by the user in the process of inputting the characters, the input sequence, the input operation and the corresponding entry on the screen can be recorded in the input behavior data, wherein the input sequence can be used for representing information transmitted to an input method through the input operation, the input operation can be widely referred to all operations of the input operation, the screen loading operation, the lattice withdrawing operation, the deleting operation, the exchanging operation, the replacing operation and the like of the input sequence, and the entry on the screen represents the entry corresponding to the screen loading operation. As in example 1 above, the input sequence is completed in two input operations: the "haole" and "re" input operations may specifically include an operation of inputting "haole", an operation of deleting "le", an operation of inputting "re", an operation of displaying "good hot", and the like, and the obtained entry word on the display is "good hot". For another example, after inputting "senem", find the mistake to lose, change "senem" into "senme" with the exchange operation.
The input behavior data meeting the preset regional error correction input condition in the embodiment of the application can be used for representing input behavior data corresponding to abnormal character input (hereinafter referred to as abnormal input behavior data); one of the main differences between the abnormal text input and the normal text input is that the former has error correction operation during the text input process, but the latter does not, and the difference causes the difference between the abnormal input behavior data and the normal input behavior data, so by analyzing the difference between the abnormal input behavior data and the normal input behavior data, the rule of the abnormal input behavior data can be summarized to obtain the corresponding preset regional error correction input condition.
In a preferred embodiment of the present application, the preset geographical error correction input condition may include at least one of the following conditions: the error correction operation is immediately after the input operation of the off-screen input sequence; and, the error correction operation is immediately subsequent to the screen entry scroll operation; the error correction operation may include at least one or more of the following operations: a back-grid operation, a delete operation, an exchange operation, and a replace operation. It is understood that a person skilled in the art may adopt various error correction operation input conditions according to the rule of the abnormal input behavior data, and the application does not limit the specific error correction operation input conditions.
In the embodiment of the application, the region input characteristics can be used for representing the input characteristics in one geographic area, and can be used for reflecting the association degree of the input behavior data and the corresponding geographic area, and the stronger the association degree of the input behavior data and the corresponding geographic area, the more the characteristics of the corresponding geographic area can be reflected; the preset region input characteristic condition is a condition corresponding to the region input characteristic, that is, the data meeting the preset region input characteristic condition in the collected input behavior data is also the input behavior data which has strong association with the corresponding geographic area and can better embody the region characteristics.
While the above embodiments have been described in detail with respect to collecting input behavior data of users in a geographic area according to geographic input characteristics, it will be appreciated by those skilled in the art that one or more of the above analysis schemes may be employed as desired.
In addition, since the three schemes correspond to specific regional input characteristics, the regional input characteristics can be used to represent input characteristics in a geographic area, and the three schemes are not limited to the three specific regional input characteristics, so that a person skilled in the art can also adopt other regional input characteristics (such as input probability of an upper screen entry in the geographic area) and a scheme for analyzing the collected input behavior data to obtain regional data of the geographic area as needed, that is, the three schemes are not understood as application limitations of the present application.
in this embodiment, the input sequence of the user may specifically include one or more of keyboard input, voice input, handwriting input, and gesture input. It should be noted that the input sequence in the expanded sense may include an original input sequence and candidates obtained by conversion for the original input sequence, and the like. The process of converting the input sequence to the candidate item belongs to the field of the prior art, and therefore is not described herein again.
The application can be applied to input methods such as a keyboard input method, a voice input method, a handwriting input method, a hybrid input method and the like, which respectively receive corresponding input sequences, for example, the keyboard input method receives keyboard input, the voice input method receives voice input, the handwriting input method receives handwriting input or gesture input, the hybrid input method receives various hybrid inputs such as keyboard, voice, handwriting, gesture and the like, and the like. The description is mainly given by taking keyboard input, i.e. encoding string as an example, and other input sequences are referred to each other.
In summary, the present application does not impose limitations on the input sequence.
When an input sequence of a user is received, a geographic area to which the input sequence belongs, that is, a geographic area in which the user is located when the input sequence is input, may be determined.
In an embodiment of the application, the step of determining the geographic area to which the input sequence belongs may specifically include:
s601, collecting position information corresponding to the input sequence;
step S602, obtaining a geographical area corresponding to the location information, that is, a geographical area to which the input sequence belongs, by matching according to a preset mapping relationship between the geographical area and the location information.
In practice, the corresponding geographical location information can be acquired according to the IP (Internet Protocol) address, the GPS (global positioning System) or the mobile network (such as Wifi wireless network, cellular network, etc.) of a user according to the input sequence of the user. The specific method for collecting the geographical location information corresponding to the input sequence is not limited in the present application.
In an application example of the present application, the mapping relationship between the preset geographic area and the location information may be a mapping relationship between a preset geographic area and a corresponding geographic location information range (latitude and longitude range), and specifically may be:
if the longitude of the current geographic position information is larger than (or larger than or equal to) the starting value of the GPS longitude of the pre-stored geographic area;
and the current geographical location information longitude is less than (or less than or equal to) the GPS longitude cutoff value of the pre-stored geographical area;
the latitude of the current geographic position information is greater than (or equal to or greater than) the initial value of the GPS latitude of the pre-stored geographic area;
and the latitude of the current geographic position information is less than (or equal to or less than) the GPS latitude cut-off value of the pre-stored geographic area;
the geographical area indicated by the current geographical location information may be determined as the pre-stored geographical area successfully matched.
In another application example of the present application, the geographic area to which the collected location information belongs can be obtained by querying various geographic service websites according to the collected location information.
In another embodiment of the present application, the location or the geographic area selected by the user may be used as the location information corresponding to the input sequence or the geographic area to which the input sequence belongs according to the selection operation of the user on the mobile device for the location or the geographic area. For example, a list or map of geographic areas may be presented on the mobile device, and the user may select a location or geographic area on the list or map of geographic areas.
In summary, the present application does not impose any limitations on the particular method of determining the geographic region to which the input sequence belongs.
Moreover, it can be seen that the determination of the geographic area to which the input sequence belongs does not necessarily depend on the positioning of the mobile device for the own geographic location information, that is, even if the mobile device does not have the positioning function of the own geographic location information, the method can still be implemented smoothly, so that the method has good expansibility.
And step 104, acquiring candidate items corresponding to the input sequence according to the regional data of the geographic area.
According to the above description, when obtaining the partitioned data, the granularity may be a vocabulary or a vocabulary bank, or may be a term (especially in the cloud input mode), where the term is not limited to a chinese term, or may be a term mixed with letters and numbers, or may be a term in languages such as english, japanese, korean, and german, so that step 104 may obtain a candidate corresponding to the input sequence according to the corresponding relationship between the vocabulary, the vocabulary bank, the term, or even the input sequence and the candidate.
When the region input characteristic is represented by a region error correction input characteristic, the region data of the embodiment of the present application may also be a corresponding relationship between an error input sequence and a correct input sequence, and accordingly, in a preferred embodiment of the present application, the step of obtaining the region data of the geographic region according to the screened input behavior data may specifically include:
step S701, when the preset region error correction input condition is that an error correction operation is immediately after an input operation of an input sequence, obtaining an error input sequence according to a non-on-screen input sequence before the error correction operation, obtaining a correct input sequence corresponding to the error input sequence according to the non-on-screen input sequence after the error correction operation, and establishing a corresponding relation between the error input sequence and the correct input sequence; and/or the presence of a gas in the gas,
step S702, when the error correction operation is immediately after the screen entry of the screen entry, obtaining an error screen entry according to the screen entry before the error correction operation, obtaining a correct screen entry corresponding to the error screen entry according to the screen entry after the error correction operation, and establishing a corresponding relation between an error input sequence and a correct input sequence according to the error screen entry and the correct screen entry;
the step of obtaining the candidate items corresponding to the input sequence according to the regional data of the geographic area may specifically include:
step S703, correcting the input sequence by using the corresponding relation between the error input sequence and the correct input sequence to obtain an input sequence after error correction;
step S704, obtaining corresponding candidate items according to the input sequence after error correction.
As in example 1 above, the input sequence is completed in two input operations: "halo" and "re", where the error correction operation immediately after the input operation of the non-on-screen input sequence "halo" is an operation of deleting "le", step S701 may use the operation of deleting "le" as a demarcation point, and obtain an error input sequence according to the non-on-screen input sequence before the operation of deleting "le"; in example 1, the non-on-screen input sequence before the operation of deleting "le" is "haole", and how to obtain the error input sequence from the non-on-screen input sequence before the operation of deleting "le":
in one embodiment of the invention, the entire "haole" may be used as the error input sequence; in another embodiment of the present invention, a part of "haole" may be selected as an error input sequence, and specifically, the selection should be performed in a sequence from the back to the front, one selection principle is to select a sequence part, which is "le", involved in an operation of deleting "le" as the error input sequence, and the other selection principle is to select a sequence part, which is mapped to a candidate item and includes a sequence part involved in an operation of deleting "le" as the error input sequence, and the sequence parts, which are mapped to the candidate item, of "e", "le", "ole", "aole", are mapped to the candidate item in a sequence from the back to the front, taking a pinyin sequence as an example, where "le", "ole", and "aole" all include sequence parts involved in an operation of deleting "le", and thus can be used as the error input sequence.
In example 1, the non-on-screen input sequence after the operation of deleting the "le" is "haore", a correct input sequence corresponding to the incorrect input sequence is obtained according to the non-on-screen input sequence after the error correction operation, and usually, a sequence portion having a length equal to that of the incorrect input sequence is selected from the non-on-screen input sequence after the operation of deleting the "le" in a sequence from the back to the front.
To better illustrate the implementation of step S702, example 2 is given here, where the input information is completed in two input operations: "halo" and "re", the input operation may specifically include an operation of inputting "halo", an operation of "happy" on the screen, an operation of deleting "happy", an operation of inputting "re", an operation of "hot" on the screen, and the like, and the entry on the screen is completed by two screen operations: "happy" and "hot", the final entry on the screen is "nice hot";
step S702 may first use the operation of deleting "music" as a demarcation point, and obtain an error entry on the screen according to the entry "happy" before the operation of deleting "music": "happy" or "happy", obtaining that the wrong entry on the screen corresponds to the correct entry on the screen according to the entry on the screen after the operation of deleting "happy", for example, "happy" corresponds to "nice hot", "happy" corresponds to "hot", etc.;
in a preferred embodiment of the present application, the correspondence between the incorrect input sequence and the correct input sequence may also be a correspondence reflecting a mispronunciation habit in a certain geographical area. For example, in examples 1 and 2, the user actually corrects "l" in the input sequence to "r", and thus, the correspondence relationship can be established with "l" as the erroneous input sequence and "r" as the correct input sequence. This preferred embodiment is very suitable for some users in geographical areas with mispronunciation habits, and can correct the mispronunciation habits of the users into correct input sequences, such as the correspondence between z and zh, the correspondence between un and ong, and the like.
In the embodiment of the present application, the implementation process of establishing the correspondence between the incorrect input sequence and the correct input sequence according to the incorrect entry and the correct entry may specifically include: obtaining an error input sequence corresponding to the wrong entry (actually all pinyin sequences or character sequences capable of being mapped to the wrong entry), obtaining an error input sequence corresponding to the correct entry (actually all pinyin sequences or character sequences capable of being mapped to the wrong entry), and finally establishing a corresponding relation between the error input sequence and the correct input sequence.
It should be noted that, the above is only an example, and the present application is not limited to specifically obtaining an erroneous entry on the screen according to the entry on the screen before the error correction operation, obtaining a correct entry on the screen corresponding to the erroneous entry according to the entry on the screen after the error correction operation, and establishing a correspondence between an erroneous input sequence and a correct input sequence according to the erroneous entry on the screen and the correct entry.
In another preferred embodiment of the present application, the input behavior data of the user in the geographic area according to which the regional data is based may be data collected in real time or periodically.
In the existing technical solution, dynamically downloading candidate word data corresponding to the current geographic location information and related to the geographic location from the network service module as required generally includes determining whether a word stock related to the current geographic location information is already loaded on the portable device, and if so, not downloading, otherwise, downloading. In this way, the word stock related to the current geographic location information, which has been loaded on the portable device, is likely to be loaded half a year ago or more, which easily results in poor timeliness of finally obtaining the candidate word information; moreover, even if the candidate word data related to the geographic position corresponding to the current geographic position information is downloaded from the network service module at that time, the candidate word data related to the geographic position stored at the network service module side is often generated in advance, and the timeliness of finally obtaining the candidate word information is still easily influenced.
In the embodiment of the present application, the real-time collection refers to collecting data in a corresponding geographic area after determining the geographic area to which the input sequence belongs in step 103, and the partitioned data obtained by analyzing the data collected in real time are also real-time, and the candidate items obtained by further obtaining are also real-time; therefore, compared with the prior art, when the input behavior data of the user in the geographic area based on the regional data is collected in real time, the timeliness of the candidate items can be improved.
In the embodiment of the present application, the periodic collection refers to collecting input behavior data of users in a geographic area according to a certain collection period, where the collection period may be determined by those skilled in the art according to actual needs, such as 24 hours, 12 hours, 6 hours, 1 hour, and so on; it will be appreciated that a relatively short collection period may ensure timeliness of the candidates.
Referring to fig. 2, a flowchart of embodiment 2 of a method for acquiring candidate items of a partitioned area according to the present application is shown, which may specifically include:
step S801, a client receives an input sequence of a user, collects position data corresponding to the input sequence, and uploads the input sequence and corresponding region data to a server;
the location data herein may specifically include: the embodiment of the present application does not limit specific location data according to the IP address, the GPS of the mobile device, or the mobile network, to acquire corresponding geographic location information, or according to a selection operation of a user on the mobile device for a location or a geographic area, a determined location or geographic area, or the like.
Step S802, the server determines the geographical area to which the input sequence belongs according to the region data corresponding to the input sequence;
step S803, the server collects the input behavior data of the user in the geographical area to which the input sequence belongs;
step S804, the server analyzes the collected input behavior data to obtain the regional data of the geographic region;
step S805, the server obtains candidate items corresponding to the input sequence according to the geographical data of the determined geographical area.
It should be noted that the server may issue the candidate items to the client after obtaining the candidate items.
Compared with the method embodiment 1, in the method embodiment 2, the collecting of the behavior data of the user, the analyzing, the obtaining of the corresponding candidate items, and the like are all completed by the server.
In the existing technical scheme, a word bank related to current geographic position information loaded on a portable device is likely to be loaded half a year ago or more, which easily results in poor timeliness of finally obtaining candidate word information; moreover, even under the condition that the candidate word data related to the geographic position corresponding to the current geographic position information is downloaded from the network service module at the moment, the candidate word data related to the geographic position and stored at the network service module side are often generated in advance, and the timeliness of the finally obtained candidate word information is still easily influenced;
in the embodiment of the application, the input behavior data of the user in the geographic area according to which the regional data is based is data collected in real time, where the real-time collection refers to collecting data in the corresponding geographic area after determining the geographic area to which the input sequence belongs, so that the regional data obtained by analyzing the data collected in real time is also real-time, and further obtained candidate items are also real-time; therefore, compared with the prior art, the input behavior data of the user in the geographic area based on the regional data is collected in real time, and timeliness of the candidate items can be improved.
For example, if a family 'ri chang restaurant' is newly opened in the region of five road junctions for nearly a few days and a user inputs the 'ri chang restaurant' entry in the region of five road junctions, the word entry of the 'ri chang restaurant' can be collected in real time and provided as a candidate for other users with related input sequences in the region of five road junctions; in the existing technical scheme, the word stock related to the current geographic position information loaded on the portable device is likely to be loaded half a year ago or more, and the word stock cannot keep up with the real-time change.
It should be noted that, in some embodiments of the present application, the server may further issue the partitioned data obtained by analysis to the client, and the client completes a process of analyzing the input sequence according to the partitioned data to obtain the candidate items.
In addition, in the process of implementing the technical solution of the present application, a person skilled in the art may classify the geographical region data according to actual requirements, for example, in an application example of the present application, the classification of the geographical region data may specifically include: organizations, dialects, lines, etc., wherein an organization may be understood as data that can be covered by online map data, such as entry data representing organizations such as a business district, a restaurant, a hotel, a supermarket, a movie theater, a sight spot, a school, a bank, etc., within a geographic area; the jargon and dialect can be understood as data which cannot be covered by online map data, such as entry data used by a small group of users in a geographic area, entry data corresponding to wrong pronunciation habits in the geographic area, and the like.
In order to make the present application better understood by those skilled in the art, the category of the geographical region data is described below through a specific application scenario.
Application scenarios 1,
The application scenario 1 mainly relates to the geographical data of the geographical area of the organization category, wherein the geographical data of the geographical area of the organization category can be obtained by analyzing the input behavior data of the user in the corresponding geographical area collected in real time or periodically.
For example, a mobile phone user listens to five new restaurants, namely, a Nissan restaurant, and wants to go to eat. After the user enters five mouths, the user does not determine where a restaurant is located, then an online map query on the mobile phone is opened, a code character string of 'richanguan' is input on the online map query, and the online map query is accessed to a server; the server determines that the geographic area to which the richchangcanguan belongs is that input behavior data of all users successfully registered in the five-crossing region before are collected, obtains regional data of the five-crossing region according to the collected input behavior data, and finds the entry of the 'sunchang restaurant' successfully matched with the richchangcanguan according to the regional data; at the moment, the user can knock out the entry at one time, so that the trouble of selecting word by word (such as selecting 'day' first, then selecting 'Chang' and finally selecting 'restaurant') is avoided, and the input efficiency of the user can be improved.
Application scenarios 2,
The application scene 2 mainly relates to the regional data of the geographical area of the jargon type, namely the vocabulary entry data used by the small group users in the geographical area, and one of the differences between the regional data of the geographical area of the organization type and the regional data of the geographical area of the organization type is that the application scene has the characteristics of popularization, colloquialization and customs, and the regional data of the geographical area of the organization type has the formal characteristics; the geographical region data of the jargon type can be obtained by analyzing the input behavior data of the user in the corresponding geographical region collected in real time or periodically.
For example, many users in a fox searching network building register to corresponding regional servers, and the entry probabilities of "fox head", "happy and happy", "banquet nameplate", "permeability", "Techno", "CTR", "chuanmaster", "dog searching input board", "number pass", "operation and maintenance" in the upper screen entries of the users are relatively high, and in fact, the entries with relatively high entry probabilities cannot be covered by online map data, but can be used as feature data or regional data in the fox searching network building; then, when a new mobile device accesses a regional server in a network building for managing search, the regional data can be directly used to respond to the input requirement of the user, for example, if the input sequence of the user is "hushou", the candidate of "fox head" can be directly provided.
Of course, the geographical region data of the geographical region of the jargon category may also provide words that cannot be covered by the geographical region data of the organizational structure category, such as "south of the Yangtze Style", "Yuanfang", "Xinjiang office", and the like.
Application scenarios 3,
The application scene 3 mainly relates to regional data of a dialect type geographic area, namely entry data used by a small group of users in the geographic area; the regional data of the geographic area of the dialect category can be obtained by analyzing the input behavior data of the user in the corresponding geographic area collected in real time or periodically.
When users in regions such as Zhejiang and Fujian enter the input, r and l are often not divided (for example, "good heat" is called "happy", and "eat meat" is called "missing" … …);
the input behavior data of the method can include all data related to input behaviors generated by a user in the process of character input, so that the method can analyze and obtain corresponding sub-region error correction attributes according to the input behavior data of the user in regions such as Zhejiang and Fujian, correct the input sequence by using the sub-region error correction attributes to obtain an error-corrected input sequence, and obtain corresponding candidate items according to the error-corrected input sequence.
In a specific implementation, the partition area error correction attribute may be used to describe a mapping relationship between an error habit and a pair habit, such as l/r, z/zh, un/ong, and the like; if the code character string input by the user in the regions of Zhejiang, Fujian and the like is 'chilou', the 'chilou' can be directly corrected to be 'chilou', and a corresponding candidate item 'eating meat' is obtained.
In addition, as mentioned above, the geographic regions of the present application may have a hierarchical relationship similar to that of administrative regions, such as national-province-city-district-county, etc., or even province, city, district, street, community, or even mansion, etc. Then, in a preferred embodiment of the present application, the method may further comprise: when the candidate item corresponding to the input sequence is failed to be acquired according to the regional data of the geographic area, the candidate item corresponding to the input sequence can be acquired according to the regional data of the upper-level geographic area of the geographic area. For example, the address range corresponding to the geographic region of "five-crossing Hualian" is hai lake district manway number 28, the upper-level geographic region thereof is "five-crossing", and the corresponding address range is hai lake district number 1-28, so that when the acquisition of the candidate item corresponding to the input sequence fails according to the regional data of the geographic region of "five-crossing Hualian" and the acquisition of the candidate item fails, the acquisition of the candidate item can be performed according to the regional data of the upper-level "five-crossing" geographic region thereof. The preferred embodiment can increase the success rate of the acquisition of the candidate items and can increase the range of the candidate items.
It should be noted that the above scheme for obtaining candidate items may be applied to an input method system, and referring to fig. 3, a schematic structural diagram of an input method system according to the present application is shown, which may specifically include: an input/output processing module 301, a code conversion module 302, a candidate generation module 303, a network service module 304, a region determination module 305 and an input method resource module 306; wherein,
an input/output processing module 301, which belongs to a UI (User Interface) layer of the system, and is responsible for providing a User Interface, receiving User input operations and screen-up operations, and displaying processing results;
the code conversion module 302, when the user generates input operation, the input and output processing module 301 transmits the input sequence of the user to the code conversion module 302 in real time, and the code conversion module 302 finishes the mapping from the input sequence of the user to the candidate;
for the most common pinyin input method, the coded character string of the user is an alphabetical (QWERTY keyboard) or numeric (T9 keyboard) sequence, and the candidate is the result of the entry of the chinese character corresponding to the coded character string, i.e. a phonetic-character conversion process is completed, which usually involves operations of querying various word lists or word banks and complicated calculation of finding the optimal path/n-best path.
The candidate generating module 303, the code conversion module 302 needs to search the related vocabulary and obtain the candidate in the working process, and this part is mainly completed by the candidate generating module 303;
taking the voice-character conversion process as an example, the candidate generating module 303 refers to the user lexicon, the system lexicon, the classified lexicon and the cloud input lexicon operation according to the input string transmitted by the code conversion module 302, and respectively obtains candidate item results. In general, the candidate generation module 303 may initiate a query request to the input method resource module 306; in the case that a network is available, real-time candidate items may also be acquired from the network service module 304;
the network service module 304 is mainly used for taking charge of the network-related operations of the input method, and mainly relates to any one of the following aspects:
on one hand, when the user inputs, the obtained region determination result 305 is sent to the server together with the input sequence transmitted by the code conversion module 302, and the server obtains candidate items which are related to the geographic area to which the input sequence belongs and accord with the input sequence conversion result in real time on line;
on the other hand, the obtained region determination result of the region determination module 305 obtains region data such as a vocabulary, a vocabulary entry and the like related to the geographic region to which the input sequence belongs from the server, and updates the region data into the input method resource module 306;
a region determining module 305, configured to acquire location information corresponding to the input sequence or directly acquire a geographic region of the input sequence; the method can acquire geographical location information corresponding to the input sequence according to an IP address, a mobile device GPS or a mobile network (such as a Wifi wireless network, a cellular network and the like), or can take the location or the geographical area selected by the user as the location information corresponding to the input sequence or the geographical area to which the location information belongs according to the selection operation of the user on the user machine aiming at the location or the geographical area;
the input method resource module 306 is configured to store relevant resources of an input method, where the resources may specifically include a common lexicon, the region-based data of the application, and the like, and the region-based data may be issued by a server.
Corresponding to the foregoing method embodiment, the present application further discloses an apparatus for obtaining candidate items, and with reference to the structure diagram shown in fig. 4, the apparatus may specifically include:
a data collection unit 401, configured to collect input behavior data of users in a certain geographic area;
a data analysis unit 402, configured to analyze the collected input behavior data to obtain regional data of the geographic area;
an input sequence receiving unit 403, configured to receive an input sequence of a user in the geographic area; and
a candidate obtaining unit 404, configured to obtain a candidate corresponding to the input sequence according to the region data of the geographic area.
In a preferred embodiment of the present application, the data collection unit 401 may specifically include:
the screening subunit is used for screening the input behavior data which accord with the input characteristic conditions of the preset region entries from the collected input behavior data;
the data analysis unit 402 may be specifically configured to obtain the geographical region data of the geographical region according to the screened input behavior data.
In another preferred embodiment of the present application, the collected input behavior data includes an input sequence, an input operation, and a corresponding entry on a screen;
the screening subunit may further include:
the error correction screening module is used for screening input behavior data, one or more of input sequences, input operations and corresponding entries on a screen meet the preset region error correction input conditions, from the collected input behavior data;
the data analysis unit 402 may be specifically configured to obtain a corresponding relationship between the incorrect input sequence and the correct input sequence according to the screened input behavior data, and use the corresponding relationship as the geographical region data of the geographical region.
In yet another preferred embodiment of the present application, the collected input behavior data comprises an upper screen entry.
In a preferred embodiment of the present application, the input characteristics of the regional vocabulary entry may specifically include the number of users; the preset region entry input characteristic conditions may specifically include preset user number conditions;
the screening subunit may specifically include:
the first statistic module is used for counting the number of users who input the entry on the screen in the collected input behavior data;
the number screening module is used for screening the upper screen entries of which the number of users meets the preset user number condition from the collected input behavior data;
the data analysis unit 402 may be specifically configured to use the screened entry on the screen as a candidate, and establish a corresponding relationship between the candidate and an input sequence in at least one input mode.
In another preferred embodiment of the present application, the input characteristics of the regional vocabulary entry include input probability comparison; the preset region entry input characteristic conditions comprise preset input probability comparison conditions;
the screening subunit may specifically include:
a second statistical module, configured to, for a certain upper-screen entry in the collected input behavior data of a certain geographic area, perform statistics on first input probabilities of all upper-screen entries in the collected input behavior data of the geographic area;
a third statistical module, configured to, for a certain top-screen entry in the collected input behavior data of all geographic areas, perform statistics on second input probabilities of all top-screen entries in the collected input behavior data of all geographic areas;
the probability comparison acquisition module is used for taking the ratio of the first input probability and the second input probability of a certain upper screen entry as the input probability comparison of the upper screen entry in the geographic area corresponding to the first input probability; and
the probability screening module is used for screening the upper screen entries of which the input probability comparison accords with the preset input probability comparison condition from the collected input behavior data of a certain geographic area;
the data analysis unit 402 may be specifically configured to use the screened entry on the screen as a candidate, and establish a corresponding relationship between the candidate and an input sequence in at least one input mode.
In a preferred embodiment of the present application, the preset geographical error correction input condition may include at least one or more of the following conditions: the error correction operation is immediately after the input operation of the off-screen input sequence; and, the error correction operation is immediately subsequent to the screen entry scroll operation;
the error correction operation may include at least one or more of the following operations: a back-grid operation, a delete operation, an exchange operation, and a replace operation.
In another preferred embodiment of the present application, the data analysis unit 402 may specifically include:
the first analysis subunit is configured to, when the preset regional error correction input condition is that an error correction operation is immediately after an input operation of an input sequence, obtain an error input sequence according to an input sequence that is not displayed before the error correction operation, obtain a correct input sequence corresponding to the error input sequence according to the input sequence that is not displayed after the error correction operation, and establish a correspondence between the error input sequence and the correct input sequence; and/or the presence of a gas in the gas,
a second analysis subunit, configured to, when the error correction operation is immediately after a screen entry of the top screen entry, obtain an erroneous top screen entry according to the top screen entry before the error correction operation, obtain a correct top screen entry corresponding to the erroneous top screen entry according to the top screen entry after the error correction operation, and establish a correspondence between an erroneous input sequence and a correct input sequence according to the erroneous top screen entry and the correct top screen entry;
the candidate obtaining unit 403 may specifically include:
the error correction subunit is used for correcting the error of the input sequence by utilizing the corresponding relation between the error input sequence and the correct input sequence to obtain an input sequence after error correction;
and the post-error-correction acquisition subunit is used for acquiring corresponding candidate items according to the post-error-correction input sequence.
In this embodiment of the application, preferably, the data collection unit 401 may be specifically configured to collect input behavior data of a user who successfully registers in the partitioned area server or accesses the geographic area where the partitioned area server is located, as input behavior data of the user in the corresponding geographic area.
In a preferred embodiment of the present application, the apparatus may further include:
and the previous-level candidate item acquisition unit is used for acquiring the candidate item corresponding to the input sequence according to the regional data of the geographical area when the candidate item corresponding to the input sequence fails to be acquired according to the regional data of the previous-level geographical area of the geographical area.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
The method and the apparatus for obtaining candidate items provided by the present application are introduced in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (20)
1. A method for obtaining a candidate, comprising:
collecting input behavior data of users in a certain geographic area;
analyzing the collected input behavior data to obtain the regional data of the geographic area;
receiving an input sequence of a user in the geographic area;
and acquiring candidate items corresponding to the input sequence according to the regional data of the geographic area.
2. The method of claim 1, wherein the step of collecting input behavior data of users within a geographic area comprises:
screening input behavior data which accord with the input characteristic conditions of preset region entries from the collected input behavior data;
and analyzing the collected input behavior data to obtain the geographical region geographical data, specifically obtaining the geographical region geographical data according to the screened input behavior data.
3. The method of claim 2, wherein the collected input behavior data comprises an input sequence, an input operation, and a corresponding on-screen entry;
the step of screening the input behavior data meeting the input characteristic conditions of the preset region entries from the collected input behavior data comprises the following steps:
screening out input behavior data of which one or more of input sequences, input operations and corresponding entry on a screen meet preset region error correction input conditions from the collected input behavior data;
and analyzing the collected input behavior data to obtain the geographical region data, specifically obtaining the corresponding relation between the wrong input sequence and the correct input sequence as the geographical region data according to the screened input behavior data.
4. The method of claim 2, wherein the collected input behavior data comprises an on-screen entry.
5. The method of claim 4, wherein the regional entry input characteristics include a number of users; the preset regional entry input characteristic conditions comprise preset user number conditions;
the step of screening the input behavior data meeting the input characteristic conditions of the preset region vocabulary entries from the collected input behavior data includes:
counting the number of users who input the entry on the screen in the collected input behavior data;
screening the entry on the screen, the number of which accords with the preset user number condition, from the collected input behavior data;
the step of obtaining the geographical region data according to the screened input behavior data includes:
and establishing a corresponding relation between the candidate item and an input sequence in at least one input mode by taking the screened entry as the candidate item.
6. The method of claim 4, wherein the regional entry input characteristics comprise an input probability comparison; the preset region entry input characteristic conditions comprise preset input probability comparison conditions;
the step of screening the input behavior data meeting the input characteristic conditions of the preset region vocabulary entries from the collected input behavior data includes:
counting first input probabilities of all the top screen entries in the collected input behavior data of a certain geographic area aiming at the top screen entries in the collected input behavior data of the geographic area;
counting second input probabilities of all the top screen entries in the collected input behavior data of all the geographic areas aiming at the top screen entries in the collected input behavior data of all the geographic areas;
taking the ratio of the first input probability and the second input probability of a certain upper screen entry as the input probability comparison of the upper screen entry in the geographic area corresponding to the first input probability;
screening the upper screen entries of which the input probability comparison accords with the preset input probability comparison condition from the collected input behavior data of a certain geographic area;
the step of obtaining the geographical region data according to the screened input behavior data includes:
and establishing a corresponding relation between the candidate item and an input sequence in at least one input mode by taking the screened entry as the candidate item.
7. The method as claimed in claim 3, wherein the preset geographical error correction input conditions at least include one or more of the following conditions: the error correction operation is immediately after the input operation of the off-screen input sequence; and, the error correction operation is immediately subsequent to the screen entry scroll operation;
the error correction operation includes at least one or more of the following operations: a back-grid operation, a delete operation, an exchange operation, and a replace operation.
8. The method of claim 7, wherein the step of obtaining the geographical data of the geographical area according to the screened input behavior data comprises:
when the preset region error correction input condition is that the error correction operation is immediately after the input operation of the input sequence, obtaining an error input sequence according to the input sequence which is not displayed before the error correction operation, obtaining a correct input sequence corresponding to the error input sequence according to the input sequence which is not displayed after the error correction operation, and establishing a corresponding relation between the error input sequence and the correct input sequence; and/or the presence of a gas in the gas,
when the error correction operation is immediately after the screen entry operation of the screen entry, obtaining an error screen entry according to the screen entry before the error correction operation, obtaining a correct screen entry corresponding to the error screen entry according to the screen entry after the error correction operation, and establishing a corresponding relation between an error input sequence and a correct input sequence according to the error screen entry and the correct screen entry;
the step of obtaining the candidate items corresponding to the input sequence according to the regional data of the geographic area includes:
correcting the input sequence by utilizing the corresponding relation between the error input sequence and the correct input sequence to obtain an corrected input sequence;
and acquiring corresponding candidate items according to the input sequence after error correction.
9. The method of any one of claims 1 to 8, wherein the step of collecting input behavior data of users within a geographic area comprises:
and collecting input behavior data of the user successfully registered in the sub-region server or accessing the geographical region where the sub-region server is located as the input behavior data of the user in the corresponding geographical region.
10. The method of claim 1, further comprising:
and when the candidate item corresponding to the input sequence is failed to be acquired according to the regional data of the geographic area, acquiring the candidate item corresponding to the input sequence according to the regional data of the upper-level geographic area of the geographic area.
11. An apparatus for obtaining candidates, comprising:
the data collection unit is used for collecting input behavior data of users in a certain geographic area;
the data analysis unit is used for analyzing the collected input behavior data to obtain the regional data of the geographic area;
the input sequence receiving unit is used for receiving an input sequence of a user in the geographic area; and
and the candidate item acquisition unit is used for acquiring candidate items corresponding to the input sequence according to the regional data of the geographic area.
12. The apparatus of claim 11, wherein the data collection unit comprises:
the screening subunit is used for screening the input behavior data which accord with the input characteristic conditions of the preset region entries from the collected input behavior data;
the data analysis unit is specifically configured to obtain the geographical region data of the geographical region according to the screened input behavior data.
13. The apparatus of claim 12, wherein the collected input behavior data comprises an input sequence, an input operation, and a corresponding on-screen entry;
the screening subunit includes:
the error correction screening module is used for screening input behavior data, one or more of input sequences, input operations and corresponding entries on a screen meet the preset region error correction input conditions, from the collected input behavior data;
the data analysis unit is specifically configured to obtain a corresponding relationship between the incorrect input sequence and the correct input sequence according to the screened input behavior data, and use the corresponding relationship as the region data of the geographic area.
14. The apparatus of claim 12, wherein the collected input behavior data comprises an on-screen entry.
15. The apparatus of claim 14, wherein the regional entry input characteristics comprise a number of users; the preset regional entry input characteristic conditions comprise preset user number conditions;
the screening subunit includes:
the first statistic module is used for counting the number of users who input the entry on the screen in the collected input behavior data;
the number screening module is used for screening the upper screen entries of which the number of users meets the preset user number condition from the collected input behavior data;
the data analysis unit is specifically configured to use the screened entry on the screen as a candidate item, and establish a corresponding relationship between the candidate item and an input sequence in at least one input mode.
16. The apparatus of claim 14, wherein the regional entry input characteristics comprise an input probability comparison; the preset region entry input characteristic conditions comprise preset input probability comparison conditions;
said screening subunit comprises:
a second statistical module, configured to, for a certain upper-screen entry in the collected input behavior data of a certain geographic area, perform statistics on first input probabilities of all upper-screen entries in the collected input behavior data of the geographic area;
a third statistical module, configured to, for a certain top-screen entry in the collected input behavior data of all geographic areas, perform statistics on second input probabilities of all top-screen entries in the collected input behavior data of all geographic areas;
the probability comparison acquisition module is used for taking the ratio of the first input probability and the second input probability of a certain upper screen entry as the input probability comparison of the upper screen entry in the geographic area corresponding to the first input probability; and
the probability screening module is used for screening the upper screen entries of which the input probability comparison accords with the preset input probability comparison condition from the collected input behavior data of a certain geographic area;
the data analysis unit is specifically configured to use the screened entry on the screen as a candidate item, and establish a corresponding relationship between the candidate item and an input sequence in at least one input mode.
17. The apparatus of claim 13, wherein the preset geographical error correction input conditions at least comprise one or more of the following conditions: the error correction operation is immediately after the input operation of the off-screen input sequence; and, the error correction operation is immediately subsequent to the screen entry scroll operation;
the error correction operation includes at least one or more of the following operations: a back-grid operation, a delete operation, an exchange operation, and a replace operation.
18. The method of claim 17, wherein the data analysis unit comprises:
the first analysis subunit is configured to, when the preset regional error correction input condition is that an error correction operation is immediately after an input operation of an input sequence, obtain an error input sequence according to an input sequence that is not displayed before the error correction operation, obtain a correct input sequence corresponding to the error input sequence according to the input sequence that is not displayed after the error correction operation, and establish a correspondence between the error input sequence and the correct input sequence; and/or the presence of a gas in the gas,
a second analysis subunit, configured to, when the error correction operation is immediately after a screen entry of the top screen entry, obtain an erroneous top screen entry according to the top screen entry before the error correction operation, obtain a correct top screen entry corresponding to the erroneous top screen entry according to the top screen entry after the error correction operation, and establish a correspondence between an erroneous input sequence and a correct input sequence according to the erroneous top screen entry and the correct top screen entry;
the candidate obtaining unit includes:
the error correction subunit is used for correcting the error of the input sequence by utilizing the corresponding relation between the error input sequence and the correct input sequence to obtain an input sequence after error correction;
and the post-error-correction acquisition subunit is used for acquiring corresponding candidate items according to the post-error-correction input sequence.
19. The apparatus according to any one of claims 11 to 18, wherein the data collection unit is specifically configured to collect, as the input behavior data of the user in the corresponding geographic area, input behavior data of a user who successfully registers in the partitioned area server or accesses the geographic area where the partitioned area server is located.
20. The apparatus of claim 11, further comprising:
and the previous-level candidate item acquisition unit is used for acquiring the candidate item corresponding to the input sequence according to the regional data of the geographical area when the candidate item corresponding to the input sequence fails to be acquired according to the regional data of the previous-level geographical area of the geographical area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210497317.1A CN103853437A (en) | 2012-11-28 | 2012-11-28 | Candidate item obtaining method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210497317.1A CN103853437A (en) | 2012-11-28 | 2012-11-28 | Candidate item obtaining method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103853437A true CN103853437A (en) | 2014-06-11 |
Family
ID=50861169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210497317.1A Pending CN103853437A (en) | 2012-11-28 | 2012-11-28 | Candidate item obtaining method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853437A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105589570A (en) * | 2014-10-23 | 2016-05-18 | 北京搜狗科技发展有限公司 | Input error processing method and apparatus |
CN106227435A (en) * | 2016-07-20 | 2016-12-14 | 广东欧珀移动通信有限公司 | A kind of input method processing method and terminal |
CN108304078A (en) * | 2017-01-11 | 2018-07-20 | 北京搜狗科技发展有限公司 | A kind of input method, device and electronic equipment |
CN112000233A (en) * | 2020-07-29 | 2020-11-27 | 北京搜狗科技发展有限公司 | Method and device for processing association candidate |
-
2012
- 2012-11-28 CN CN201210497317.1A patent/CN103853437A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105589570A (en) * | 2014-10-23 | 2016-05-18 | 北京搜狗科技发展有限公司 | Input error processing method and apparatus |
CN105589570B (en) * | 2014-10-23 | 2019-04-09 | 北京搜狗科技发展有限公司 | A kind of method and apparatus handling input error |
CN106227435A (en) * | 2016-07-20 | 2016-12-14 | 广东欧珀移动通信有限公司 | A kind of input method processing method and terminal |
CN108304078A (en) * | 2017-01-11 | 2018-07-20 | 北京搜狗科技发展有限公司 | A kind of input method, device and electronic equipment |
CN108304078B (en) * | 2017-01-11 | 2024-01-30 | 北京搜狗科技发展有限公司 | Input method and device and electronic equipment |
CN112000233A (en) * | 2020-07-29 | 2020-11-27 | 北京搜狗科技发展有限公司 | Method and device for processing association candidate |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11580104B2 (en) | Method, apparatus, device, and storage medium for intention recommendation | |
US10095711B2 (en) | Method and apparatus of recommending candidate terms based on geographical location | |
KR102137767B1 (en) | Dynamic language model | |
Han et al. | A stacking-based approach to twitter user geolocation prediction | |
US7809721B2 (en) | Ranking of objects using semantic and nonsemantic features in a system and method for conducting a search | |
US8145703B2 (en) | User interface and method in a local search system with related search results | |
US8732155B2 (en) | Categorization in a system and method for conducting a search | |
US8782041B1 (en) | Text search for weather data | |
US7921108B2 (en) | User interface and method in a local search system with automatic expansion | |
CN110019645B (en) | Index library construction method, search method and device | |
US20120296865A1 (en) | Terminal device and word stock update method thereof | |
US20090132646A1 (en) | User interface and method in a local search system with static location markers | |
JP2015118708A (en) | Method and apparatus for providing search results | |
CN102289467A (en) | Method and device for determining target site | |
CN107203526B (en) | Query string semantic demand analysis method and device | |
US11700229B2 (en) | Geolocation using reverse domain name server information | |
TW201933879A (en) | Method and device for content recommendation | |
EP3033693A1 (en) | Systems and methods for processing search queries utilizing hierarchically organized data | |
US20090132514A1 (en) | method and system for building text descriptions in a search database | |
KR102601545B1 (en) | Geographic position point ranking method, ranking model training method and corresponding device | |
CN103853437A (en) | Candidate item obtaining method and device | |
JP2011501849A (en) | Information map management system and information map management method | |
US20090132512A1 (en) | Search system and method for conducting a local search | |
WO2009064313A1 (en) | Correlation of data in a system and method for conducting a search | |
US20090132572A1 (en) | User interface and method in a local search system with profile page |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140611 |
|
RJ01 | Rejection of invention patent application after publication |