Summary of the invention
In view of the above problems, it is proposed that the present invention in case provide one overcome the problems referred to above or at least in part solve or
What person slowed down the problems referred to above determines the system of POI effectiveness and corresponding based in network based on the address date in network
The address date method that determines POI effectiveness.
POI effectiveness is determined based on the address date in network according to an aspect of the invention, it is provided a kind of
System, this system includes:
POI acquiring unit, for utilizing the address date in network to obtain corresponding identical POI name based on search engine
The multiple relevant POI claimed;
Statistic unit, for adding up the occurrence number in described POI address date in described network;
POI determines unit, for according to the occurrence number in described POI address date in described network
Determine effective POI of corresponding described identical POI title.
Preferably, the plurality of relevant POI is the information of corresponding at least one preset attribute of POI.
Preferably, described preset attribute be longitude and latitude, address, building name or included organization.
Preferably, described statistic unit farther includes:
POI source acquisition module, for obtaining the source of described POI;
POI reliability of source judge module, is used for judging whether described source belongs to reliable sources;
Statistical module, adds up described POI in described network in the case of belonging to reliable sources in source
Occurrence number in address date;Do not add up.
Preferably, described POI determines that unit farther includes:
Judgment sub-unit, for judging occurrence number in described POI address date in described network whether
Higher than predetermined threshold;
Information point information determines subelement, in the case of described judgment sub-unit is judged as YES, determines acquired
POI effective.
Preferably, described reliable sources are to have the source of predetermined credibility.
Preferably, described source is website or webpage.
According to another aspect of the present invention, it is provided that a kind of determine that POI is effective based on the address date in network
The method of property, including:
The address date in network is utilized to obtain the multiple relevant POI of corresponding identical POI title;
Add up the occurrence number in described POI address date in described network;
Corresponding described identical POI is determined according to the occurrence number in described POI address date in described network
Effective POI of title.
Preferably, the plurality of relevant POI is the information of corresponding at least one preset attribute of POI.
Preferably, described preset attribute be longitude and latitude, address, building name or included organization.
Preferably, described step: add up the occurrence number in described POI address date in described network, enter
One step includes:
Obtain the source of described POI;
Judge whether described source belongs to reliable sources, if it is, add up described POI ground in described network
Occurrence number in the data of location, does not adds up.
Preferably, described step: determine according to the occurrence number in described POI address date in described network
Effective POI of corresponding described identical POI title, farther includes:
Judge that whether the occurrence number in described POI address date in described network is higher than predetermined threshold;
If it is, determine that described POI is effective.
Preferably, described reliable sources are to have the source of predetermined credibility.
Preferably, described source is website or webpage.
The invention have the benefit that
The present invention address date to utilizing in network obtains the multiple relevant POI of corresponding identical POI title, according to
Occurrence number in POI address date in a network determines effective POI of corresponding described identical POI title, from
And allow users to search quickly and accurately the one or more POI titles corresponding with the POI address once, latitude, so
Rear utilize network voting mechanism enter according to information source and its frequency occurred on the internet from one or more POI titles
Row filters, and selects POI name with a high credibility and is referred to as the POI title that current POI address is corresponding, improves the effectiveness of POI.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of description, and in order to allow above and other objects of the present invention, the feature and advantage can
Become apparent, below especially exemplified by the detailed description of the invention of the present invention.
Detailed description of the invention
Embodiments of the invention are described below in detail, and the example of described embodiment is shown in the drawings, the most from start to finish
Same or similar label represents same or similar element or has the element of same or like function.Below with reference to attached
The embodiment that figure describes is exemplary, is only used for explaining the present invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singulative used herein " ", "
Individual ", " described " and " being somebody's turn to do " may also comprise plural form.It is to be further understood that use in the description of the present invention arranges
Diction " including " refers to there is described feature, integer, step, operation, element and/or assembly, but it is not excluded that existence or adds
Other features one or more, integer, step, operation, element, assembly and/or their group.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, and all terms used herein (include technology art
Language and scientific terminology), have with the those of ordinary skill in art of the present invention be commonly understood by identical meaning.Also should
Be understood by, those terms defined in such as general dictionary, it should be understood that have with in the context of prior art
The meaning that meaning is consistent, and unless by specific definitions, otherwise will not explain by idealization or the most formal implication.
Based on the address date in network, what Fig. 1 diagrammatically illustrated one embodiment of the invention determines that POI is effective
The block diagram of the system of property.
With reference to Fig. 1, the system determining POI effectiveness based on the address date in network of the embodiment of the present invention, bag
Include:
POI acquiring unit 11, for utilizing the address date in network to obtain corresponding identical POI based on search engine
The multiple relevant POI of title;
In the embodiment of the present invention, the plurality of relevant POI is the information of corresponding at least one preset attribute of POI.Enter
One step ground, described preset attribute is longitude and latitude, address, building name or included organization.
Statistic unit 12, for adding up the occurrence number in described POI address date in described network;
POI determines unit 13, for according to going out occurrence in described POI address date in described network
Number determines effective POI of corresponding described identical POI title.
The embodiment of the present invention, captures address date based on search engine from network data, and described address date includes name
Claim field and address information, the map address date excavated from the Internet based on search engine, such as name: permanent big real estate collection
Kunming company of group;Address: 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building, wherein " Heng great real estate group elder brother
Bright company " it is the title of POI, " 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building " is the address of this POI, passes through
Longitude and latitude parsing to address can obtain the latitude and longitude information at this address place, such as address " Panlong District, Kunming North Star wealth
14th floor, rich Building A, center office building " longitude and latitude resolves the longitude and latitude obtained and is: east longitude: 102.733445 north latitude: 25.08108.Separately
Outward, need to add up number of times and the record source that POI occurs on the internet.
So, the form of the POI of different information sources corresponding to address date finally excavated from the Internet is such as
Shown in table 1, specific as follows:
The form table of the POI of the different information source of table 1
From table 1, same geographical position (longitude and latitude is identical) from the POI data that different source webs obtain, have
There may be repeatability data, the most same address (longitude and latitude) there may be multiple POI name, such as same longitude and latitude in table 1
There is multiple company in degree, the POI longitude of its reality, latitude are identical, but the describing mode of POI title and POI address is the most different;
It can also be seen that same poi name may multiple different saying, such as " Baoshan show one's high ideals sale of automobile company limited " and
" Baoshan show one's high ideals sale of automobile Services Co., Ltd ", the POI data of repeatability cause user cannot search fast and accurately with
The POI title that the POI address in one POI geographical position (longitude and latitude) is corresponding.
In the embodiment of the present invention, the address date in network is utilized to obtain corresponding identical POI title based on search engine
Multiple relevant POI, wherein, multiple relevant POI are the information of corresponding at least one preset attribute of POI, described default
Attribute is longitude and latitude, address, building name or included organization, according to described POI ground in described network
Occurrence number in the data of location determines effective POI of corresponding described identical POI title.
Further, step determine according to the occurrence number in described POI address date in described network right
Answer effective POI of described identical POI title, including: the information according to the preset attribute of relevant POI is identical by correspondence
The name field of address information clusters according to key word, the frequency that after Statistical Clustering Analysis, middle name field of all categories occurs, makees
Be second frequency, according to described second frequency determine the category to should the POI title of address information, according to described second frequency
Determine the category to should the POI title of address information, utilize the Internet " to vote " mechanism to choose the credible of identical POI title
, effective POI.
Further, determine one or more key word based on described name field, by corresponding same address information
Described key word clusters, and determines the name field after cluster according to the key word after cluster.
Further, the title in described name field is cut word and processes generation participle, obtain according to described participle
Take the key word of described name field.
Further, the frequency that each participle of the corresponding same address information of statistics occurs, as first frequency, according to
Described first frequency generates the key word of described name field, specifically, select described first frequency minimum and be non-place name
Participle as the key word of described name field.
Further, the present invention can be using name field the highest for second frequency described in each class described as class mark
Know title, using every class mark title all as to should the POI title of address information;Or, by each apoplexy due to endogenous wind second frequency described
The highest name field identifies title as class, is referred to as by class identification names most for occurrence number on network believing address
The POI title of breath.
Wherein, the title of POI in the address date excavated is cut word, and add up cut word after each word occur
Number of times, in same POI title, the minimum quantity of information that i.e. comprises of frequency of occurrence is maximum, and is that word of non-place name is designated as this
After in the relevant POI that in the key word of POI title, such as table 1, the address date of appearance is corresponding, POI title cuts word, data are such as
(word frequency is to add up according to the poi name of about 90,000,000) shown in table 2, in table 2, second is classified as the key word got, the most such as
Under:
The tables of data cut after word of table 2POI title
According to keyword clustering: the POI title that same key word is corresponding is designated as same class, and above-mentioned several POI titles can
To be classified as 5 classes, say, that there are 5 different poi names on this POI address, be respectively as follows:
A: Bo Xin source, Baoshan automotive trade company limited;
B: company limited's Lancang River in Yunnan Province beer brewery groups Baoshan, Lancang River in Yunnan Province beer brewery groups Baoshan company limited (
Figure mark);
Show one's high ideals sale of automobile Services Co., Ltd in sale of automobile company limited Baoshan of showing one's high ideals, C: the Baoshan
D: Great Wall Automobile 4S shop, the Baoshan;
Sale company limited (Chevrolet 4S shop) that is easily open to the traffic is melted in E: the Baoshan.
In order to embody the superiority of invention further, disclose the present invention further below true based on the address date in network
Determine the internal structure in another embodiment of statistic unit 12 in the system of POI effectiveness, embody according to statistics
The details of another embodiment that unit 12 realizes.With reference to Fig. 2, statistic unit 12 farther includes POI source acquisition module
121, POI reliability of source judge module 122 and statistical module 123:
Described POI source acquisition module 121, for obtaining the source of described POI;
Described POI reliability of source judge module 122, is used for judging whether described source belongs to reliable sources;
Described statistical module 123, for adding up described POI described in the case of source belongs to reliable sources
The occurrence number in address date in network;Do not add up.
In the present embodiment, in of a sort POI title, choose optimal POI title according in interconnection " ballot "
Solving, so-called " ballot " is mainly the frequency and the credibility in source occurred on the internet according to this POI title, interconnection
The online frequency occurred is the highest, that name the most believable of originating is optimal name to be chosen.Such as:
A apoplexy due to endogenous wind only one of which name, optimal is also this.
B apoplexy due to endogenous wind has two names, and the frequency that wherein " Lancang River in Yunnan Province beer brewery groups Baoshan company limited " occurs is the highest,
As optimal name.
C apoplexy due to endogenous wind has two names, and the frequency that wherein " Baoshan show one's high ideals sale of automobile Services Co., Ltd " occurs is the highest, as
Optimal name.
D class and E apoplexy due to endogenous wind are only one of which name equally, similar A.
In order to embody the superiority of invention further, disclose the present invention further below true based on the address date in network
Determine the POI in the system of POI effectiveness and determine the internal structure in another embodiment of unit 13, embody and depend on
The details of another embodiment that unit 13 realizes is determined according to POI.With reference to Fig. 3, POI determines that unit 13 farther includes
Judgment sub-unit 131 and information point information determine subelement 132:
Described judgment sub-unit 131, for judging the appearance in described POI address date in described network
Whether number of times is higher than predetermined threshold;
Described information point information determines subelement 132, is used in the case of described judgment sub-unit is judged as YES, really
Fixed acquired POI is effective.
In the embodiment of the present invention, the frequency that POI occurs in interconnection credibility the highest, source is the most credible, then POI
Information is the most credible.The frequency and the source that occur according to it optimal POI name finally chosen in interconnection are filtered, and are higher than
Certain threshold value is then the final believable POI excavated.
In the embodiment of the present invention, described reliable sources are to have the source of predetermined credibility.Wherein, described source is website
Or webpage.
In the embodiment of the present invention, website or the webpage in the source of predetermined credibility include but not limited to, such as Sina, phoenix
The large-scale websites such as net, by the website that website, visitation frequency are higher, data traffic is big of official's certification and do not carry malice
Link, virus link and CSAT hand over high website etc..
In the embodiment of the present invention, credibility is quantifiable, can be right according to the access times of user and customer evaluation etc.
The credibility of each website or webpage quantifies.And the credibility of each website or webpage is dynamically change, if currently
There is virus, swindle advertisement or is utilized by other dolus malus websites in website, then its credibility can decrease, and the present invention is led to
Cross the quantization of website credibility and dynamically adjust, being further ensured that POI reliable, effective of acquisition.
The present embodiment address date to utilizing in network obtains the multiple relevant POI of corresponding identical POI title, root
Effective POI of corresponding described identical POI title is determined according to the occurrence number in POI address date in a network,
Enable a user to search quickly and accurately the one or more POI titles corresponding with the POI address once, latitude,
Then utilize network voting mechanism from one or more POI titles according to information source and its frequency occurred on the internet
Filter, select POI name with a high credibility and be referred to as the POI title that current POI address is corresponding, improve the effective of POI
Property.
Based on the address date in network, what Fig. 4 diagrammatically illustrated one embodiment of the invention determines that POI is effective
The flow chart of the method for property.
With reference to Fig. 4, the embodiment of the present invention determine that the method for POI effectiveness includes based on the address date in network
Following steps:
S11, the multiple relevant POI of the address date corresponding identical POI title of acquisition utilized in network;
S12, the occurrence number added up in described POI address date in described network;
S13, determine according to the occurrence number in described POI address date in described network corresponding described identical
Effective POI of POI title.
In the embodiment of the present invention, the plurality of relevant POI is the information of corresponding at least one preset attribute of POI.Its
In, described preset attribute is longitude and latitude, address, building name or included organization.
The embodiment of the present invention, captures address date based on search engine from network data, and described address date includes name
Claim field and address information, the map address date excavated from the Internet based on search engine, such as name: permanent big real estate collection
Kunming company of group;Address: 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building, wherein " Heng great real estate group elder brother
Bright company " it is the title of POI, " 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building " is the address of this POI, passes through
Longitude and latitude parsing to address can obtain the latitude and longitude information at this address place, such as address " Panlong District, Kunming North Star wealth
14th floor, rich Building A, center office building " longitude and latitude resolves the longitude and latitude obtained and is: east longitude: 102.733445 north latitude: 25.08108.Separately
Outward, need to add up number of times and the record source that POI occurs on the internet.
But, same geographical position (longitude and latitude is identical) from the POI data that different source webs obtain, it is possible to deposit
In repeatability data, the most same address (longitude and latitude) there may be multiple POI name, and e.g., same longitude and latitude exists multiple
Company, the POI longitude of its reality, latitude are identical, but the describing mode of POI title and POI address is the most different;Can also see
Going out, same poi name may multiple different saying, such as " Baoshan show one's high ideals sale of automobile company limited " and " show one's high ideals in the Baoshan
Sale of automobile Services Co., Ltd ", it is geographical that the POI data of repeatability causes user cannot search same POI fast and accurately
The POI title that the POI address of position (longitude and latitude) is corresponding.
To this, the embodiment of the present invention, the title of POI in the address date excavated is cut word, and adds up and cut word
The number of times that rear each word occurs, in same POI title, the minimum quantity of information i.e. comprised of frequency of occurrence is maximum, and is non-place name
That word be designated as the key word of this POI title.
In order to embody the superiority of invention further, disclose the present invention further below true based on the address date in network
Determine the fine division step of step S12 in the method for POI effectiveness, embody another embodiment realized according to this step.Ginseng
According to Fig. 5, the fine division step of this step includes:
S121, obtain the source of described POI;
S122, judge whether described source belongs to reliable sources, if it is, perform step S123;
S123, when described source belongs to reliable sources, add up in described POI address date in described network
Occurrence number, do not add up.
In the present embodiment, in of a sort POI title, choose optimal POI title according in interconnection " ballot "
Solving, so-called " ballot " is mainly the frequency and the credibility in source occurred on the internet according to this POI title, interconnection
The online frequency occurred is the highest, that name the most believable of originating is optimal name to be chosen.
In order to embody the superiority of invention further, disclose the present invention further below true based on the address date in network
Determine the fine division step of step S13 in the method for POI effectiveness, embody another embodiment realized according to this step.Ginseng
According to Fig. 6, the fine division step of this step includes:
Whether S131, the occurrence number judged in described POI address date in described network be higher than predetermined threshold
Value;If it is, perform step S132,
S132, determine that described POI is effective.
In the embodiment of the present invention, the frequency that POI occurs in interconnection credibility the highest, source is the most credible, then POI
Information is the most credible.The frequency and the source that occur according to it optimal POI name finally chosen in interconnection are filtered, and are higher than
Certain threshold value is then the final believable POI excavated.
In the embodiment of the present invention, described reliable sources are to have the source of predetermined credibility.Wherein, described source is website
Or webpage.
In the embodiment of the present invention, website or the webpage in the source of predetermined credibility include but not limited to, such as Sina, phoenix
The large-scale websites such as net, by the website that website, visitation frequency are higher, data traffic is big of official's certification and do not carry malice
Link, virus link and CSAT hand over high website etc..
In the embodiment of the present invention, credibility is quantifiable, can be right according to the access times of user and customer evaluation etc.
The credibility of each website or webpage quantifies.And the credibility of each website or webpage is dynamically change, if currently
There is virus, swindle advertisement or is utilized by other dolus malus websites in website, then its credibility can decrease, and the present invention is led to
Cross the quantization of website credibility and dynamically adjust, being further ensured that POI reliable, effective of acquisition.
By the side determining POI effectiveness based on the address date in network using the embodiment of the present invention to provide
Method, according to word frequency after cutting word time number excavate the key word of poi name, and cluster with this key word, difference said
It is a class that the same poi name of method is gathered, the problem solving the corresponding multiple poi names of same longitude and latitude, utilizes the Internet " to throw
Ticket " mechanism chooses optimal poi name, utilizes upper " ballot " mechanism of interconnection to choose believable poi information.
In sum, the present invention address date to utilizing in network obtains the multiple relevant POI of corresponding identical POI title
Information, determines the effective of corresponding described identical POI title according to the occurrence number in POI address date in a network
POI, enables a user to search quickly and accurately with corresponding one or more in the POI address once, latitude
POI title, then utilize network voting mechanism from one or more POI titles according to information source and its go out on the internet
The existing frequency filters, and selects POI name with a high credibility and is referred to as the POI title that current POI address is corresponding, improves POI letter
The effectiveness of breath.
It should be noted that, provided herein algorithm and formula not with any certain computer, virtual system or miscellaneous equipment
Intrinsic relevant.Various general-purpose systems can also be used together with based on example in this.As described above, this kind of system is constructed
Structure required by system is apparent from.Additionally, the present invention is also not for any certain programmed language.It should be understood that permissible
Utilize various programming language to realize the content of invention described herein, and the description above language-specific done be in order to
Disclose the preferred forms of the present invention.
In description mentioned herein, illustrate a large amount of detail.It is to be appreciated, however, that the enforcement of the present invention
Example can be put into practice in the case of not having these details.In some instances, it is not shown specifically known method, structure
And technology, in order to do not obscure the understanding of this description.
Similarly, it will be appreciated that one or more in order to simplify that the present invention helping understands in various aspects of the present invention,
Above in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes
In example, figure or descriptions thereof.But, the method and apparatus of the disclosure should not be construed to reflect an intention that i.e. institute
Claimed invention requires than the more feature of feature being expressly recited in each claim.More precisely,
As claims reflect, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool
Claims of body embodiment are thus expressly incorporated in this detailed description of the invention, the conduct of the most each claim itself
The independent embodiment of the present invention.
Those skilled in the art are appreciated that and can carry out the module in the equipment in embodiment adaptively
Change and they are arranged in one or more equipment different from this embodiment.Can be the module in embodiment or list
Unit or assembly are combined into a module or unit or assembly, and can put them in addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit excludes each other, can use any
Combine all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed appoint
Where method or all processes of equipment or unit are combined.Unless expressly stated otherwise, this specification (includes adjoint power
Profit requires, summary and accompanying drawing) disclosed in each feature can be carried out generation by providing identical, equivalent or the alternative features of similar purpose
Replace.
Although additionally, it will be appreciated by those of skill in the art that embodiments more described herein include other embodiments
Some feature included by rather than further feature, but the combination of the feature of different embodiment means to be in the present invention's
Within the scope of and form different embodiments.
The all parts embodiment of the present invention can realize with hardware, or to run on one or more processor
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that and can use in practice
Microprocessor or digital signal processor (DSP) realize in web portal security detection equipment according to embodiments of the present invention
The some or all functions of a little or whole parts.The present invention is also implemented as performing method as described herein
Part or all equipment or device program (such as, computer program and computer program).Such realization
The program of the present invention can store on a computer-readable medium, or can be to have the form of one or more signal.This
The signal of sample can be downloaded from internet website and obtain, or provides on carrier signal, or carries with any other form
Supply.
The above is only the some embodiments of the present invention, it is noted that for the ordinary skill people of the art
For Yuan, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications also should
It is considered as protection scope of the present invention.