JPH10283355A

JPH10283355A - Method for analyzing enterprise name and device therefor

Info

Publication number: JPH10283355A
Application number: JP9083535A
Authority: JP
Inventors: Shigeto Iwase; 成人岩瀬; Jun Mizutani; 純水谷
Original assignee: N T T SOFTWARE KK; Nippon Telegraph and Telephone Corp; NTT Software Corp
Current assignee: N T T SOFTWARE KK; Nippon Telegraph and Telephone Corp; NTT Software Corp
Priority date: 1997-04-02
Filing date: 1997-04-02
Publication date: 1998-10-23

Abstract

PROBLEM TO BE SOLVED: To separate an enterprise name even when the ending word of the enterprise name is not included, to describe an optimal expression, and to easily change a rule at the time of dividing the enterprise name inputted without any separation of an organization into a main name and a lower organization name such as a branch office and section name, and using it for a task for retrieval with the main name or the lower organization name. SOLUTION: A word dividing part 201 divides an inputted enterprise name into words by using a word dictionary 202 which registers a word and the meaning constituting the enterprise name, and applies the meaning of the word. A rule collating part 203 collates the divided enterprise name with a separation rule for applying the separation of the organization, and analyzes the separation position. A separation applying part 205 applies the separation of the organization to the analyzed separation position. A rule selecting part 401 selects one of each rule 402-405 from a rule storage part 204, and extracts and applies it.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、企業名を扱う顧客
システムで、組織の区切り無しに入力された企業名を主
名義と支店・部課名等の下部組織に分割して、主名義や
下部組織名で検索する業務に利用できるようにした企業
名解析方法及び装置に関し、特に区切りルールをテーブ
ル化して、ルールの変更を容易にできる企業名解析方法
及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a customer system for handling company names, which divides a company name entered without division of an organization into a main name and a subordinate organization such as a branch or a department name, and the like. More particularly, the present invention relates to a method and apparatus for analyzing a company name which can be used for a task of retrieving by organization name, and in particular, can make a table of delimiter rules and easily change the rule.

【０００２】[0002]

【従来の技術】一般に、企業名を扱う顧客システム、例
えば市役所や郵便局等において企業名を扱う公共の顧客
業務、あるいは広告会社、銀行、保険会社等において企
業名を扱う民間の顧客業務等では、主名義から下部組織
まで長く続いた名前、主名義だけの名前、部課係等を省
略した様式の名前等、種々の様式の名前を扱う。そし
て、必要に応じてこれら種々の様式の名前の中から統一
的に決められた名前だけを抽出する必要があり、また検
索する場合には主名義のみ、あるいは部課名までの名前
に分割した名前で検索する必要がある。このような場
合、企業名に含まれる主名義の末尾を表す文字列（例え
ば『市役所』、『図書館』、『大学』等）や下部組織を
表す特定の文字列（例えば『部』、『課』、『室』、
『支店』等）が含まれている場所で区切る方法がある。2. Description of the Related Art In general, a customer system handling company names, for example, a public customer service handling company names at a city hall or a post office, or a private customer service handling company names at an advertising company, a bank, an insurance company, etc. It handles various forms of names, such as names that last from the main name to the subordinate organizations, names in the name of the main name only, and names in a form that omits departmental staff. Then, if necessary, it is necessary to extract only the unified names from the names of these various styles. When searching, only the main name or the name divided into the names up to the department name It is necessary to search by. In such a case, a character string representing the end of the main name included in the company name (for example, “city hall”, “library”, “university”, etc.) or a specific character string representing the subordinate organization (for example, “department”, “section”) ], “Room”,
"Branch" etc.).

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記の
ような『市役所』や『部』等の特定の文字列を含むこと
だけで区切りを判断すると、次のような問題点が生じ
る。企業名の末尾語には、『大学』、『農協』、『商店』
等の１単語の場合のみならず、『振興／会』、『ショッ
ピング／センター』等、２単語以上の場合もあり、これ
らの全てを最初から考慮することは不可能である。『〇〇スーパー市役所前店』の場合のように、主名義
の末尾を表す単語に『前』が続くと、単語の意味が目標
物を表すように変化する。このような場合を１つ１つプ
ログラム中に埋め込むことは極めて難かしく、手間もか
かってしまう。チェーン店等の名前では、どこで区切るのかが単語の
意味だけでは判断できない場合がある。例えば、『〇〇
レンタカー横浜／磯子店』と『〇〇マート／横浜磯子
店』とは同じように地名が連続しているが、区切る位置
が異なっている。『ＡＢＣ／渋谷店』のように主名義中
に区切りの手掛かりの無い場合もある。そこで、本発明
の目的は、このような従来の問題点を解消し、組織の区
切り無しに入力された企業名を区切りルールにより区切
って、検索に利用することができ、また下部組織名にか
かわらず区切ったり、主名義に掲載された区切りの手掛
かりが無い場合にも、正しく区切ることができる企業名
解析方法及び装置を提供することにある。However, if the delimiter is determined only by including a specific character string such as "city hall" or "department" as described above, the following problem occurs. The last words of the company name are "University", "Agricultural Cooperative", "Shop"
Not only in the case of a single word such as "promotion / meeting" or "shopping / center", there are also cases in which there are two or more words, and it is impossible to consider all of them from the beginning. As in the case of "@Super City Hall", if the word indicating the end of the main name is followed by "before", the meaning of the word changes to indicate the target. Embedding such cases in a program one by one is extremely difficult and time-consuming. In the name of a chain store or the like, it may not be possible to judge where to divide based only on the meaning of a word. For example, "@rental car Yokohama / Isogo store" and "@ mart / Yokohama Isogo store" have the same place names, but are separated at different positions. In some cases, such as "ABC / Shibuya store", there is no clue to the separation in the name of the main person. Therefore, an object of the present invention is to solve such a conventional problem, and a company name input without division of an organization can be separated by a division rule and used for a search. It is an object of the present invention to provide a method and apparatus for analyzing a company name that can be correctly separated even if there is no clue to the delimitation or a delimiter posted in the main name.

【０００４】[0004]

【課題を解決するための手段】上記目的を達成するた
め、本発明の企業名解析装置では、単語の表記自体、単
語の意味及び否定記号、区切り記号、任意の単語を表す
記号等の記号類を組み合わせて記述されたルールを解釈
するルール解釈部、企業名と区切りルールを照合するル
ール照合部、及び区切りを付与するルールを記憶する記
憶部を設けて、ルールに従って入力された企業名を単語
に分割し、単語の意味を付与し、ルール照合部により区
切り位置を解析して、解析結果の区切りを付与する。本
発明においては、区切りルールをテーブル化したため、
ルールの変更が容易であり、単語意味、否定、単語リス
ト等を表す記号を用いてルールが書けるため、ルールの
表現力が増大する。また、先頭からチェックするルール
や、下部組織から主名義を判断するルールを設けたた
め、コンビニ等のように下部組織名にかかわらず区切っ
たり、主名義に掲載された区切りの手掛かりが無い場合
にも、正しく区切ることができる。In order to achieve the above object, a company name analyzing apparatus according to the present invention employs notations such as words themselves, meanings of words and negative signs, delimiters, symbols representing arbitrary words, and the like. A rule interpreting unit that interprets rules described by combining the rules, a rule matching unit that matches a company name with a delimiter rule, and a storage unit that stores a rule that gives a delimiter. , The meaning of the word is given, the break position is analyzed by the rule matching unit, and the break of the analysis result is given. In the present invention, since the break rules are tabulated,
Rules can be easily changed, and rules can be written using symbols representing word meaning, negation, word list, and the like, so that the expressiveness of rules increases. In addition, rules to check from the beginning and rules to judge the main name from the lower organization are set, so even if there is no clue to the division, regardless of the lower organization name, such as a convenience store, there is no clue to the separation posted in the main name Can be separated correctly.

【０００５】[0005]

【発明の実施の形態】以下、本発明の実施例を、図面に
より詳細に説明する。図１は、本発明の原理を説明する
ための図である。本発明の企業名解析方法は、大きく２
つのステップ動作に分けることができる。ステップ１
は、文字列を単語に分割して、それらの単語に意味を割
り当てる処理である。次のステップ２は、ルール記憶部
を参照してマッチするルールを探し、組織の区切りを付
与する処理である。本発明の第１の実施例では、ステッ
プ２の区切りルールをルール記憶部にテーブル化するこ
とにより、ルールの変更を容易にする。また、本発明の
第２の実施例では、ルールに単語実体のみならず単語の
意味、前後に置いてはならない単語、単語のリスト等を
記述できるようにすることにより、ルールの表現力を増
大させる。また、本発明の第３の実施例では、ステップ
２の区切りルールにおいて、企業名の末尾語を判断して
区切るルールのみならず、先頭から照合するルールや下
部組織の部分を判断し、その前で区切るルール等を設け
ることにより、末尾語のみでは区切れない企業名も区切
ることができる。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a diagram for explaining the principle of the present invention. The method for analyzing a company name according to the present invention is roughly divided into two.
It can be divided into two step operations. Step 1
Is a process of dividing a character string into words and assigning meanings to those words. The next step 2 is a process of referring to the rule storage unit to search for a matching rule and assigning an organization break. In the first embodiment of the present invention, the break rule of step 2 is tabulated in the rule storage unit, thereby making it easy to change the rule. Further, in the second embodiment of the present invention, not only the word entity but also the meaning of the word, words that must not be placed before and after the word, a list of words, and the like can be described in the rule, thereby increasing the expressive power of the rule. Let it. Further, in the third embodiment of the present invention, in the delimiter rule of step 2, not only the rule for determining and terminating the last word of the company name, but also the rule to be collated from the head and the part of the subordinate organization are determined. By providing a rule for delimiting with, a company name that cannot be delimited by only the last word can be delimited.

【０００６】図２は、本発明の第１の実施例を示す企業
名解析装置のブロック図であり、図５は、図２における
企業名解析方法を示す動作フローチャートである。第１
の実施例では、企業名解析装置は、入力された企業名を
単語辞書を用いて単語に分割し、単語の意味を付与する
単語分割部２０１と、単語自体とその単語に関する活用
変化、意味、解析に必要なデータ等が格納されている単
語辞書２０２と、単語に分割された企業名と区切りルー
ルを照合するルール照合部２０３と、組織の区切りを付
与するルールを記憶する区切りルール記憶部２０４と、
ルール照合部２０３が解析した区切り位置に組織の区切
りを付与する区切り付与部２０５とから構成される。FIG. 2 is a block diagram of a company name analyzing apparatus showing a first embodiment of the present invention, and FIG. 5 is an operation flowchart showing a company name analyzing method in FIG. First
In the embodiment, the company name analysis device divides the input company name into words using a word dictionary, and gives a word meaning, a word itself and inflectional change, meaning, A word dictionary 202 storing data necessary for analysis, a rule matching unit 203 for matching a company name divided into words with a segmentation rule, and a segmentation rule storage unit 204 for storing a rule for giving an organizational segmentation When,
The rule collating unit 203 includes a delimiter assigning unit 205 that assigns an organization delimiter to a delimiter position analyzed.

【０００７】図５に示すように、先ず単語分割部２０１
は、入力された文字列を単語に分割して、その単語に意
味を割り当てる（ステップ５０１）。次に、ルール照合
部２０３は、分割される前の文字列に対して、ルール照
合位置を先頭から末尾までずらしながらそれ以下の処理
を繰り返す（ステップ５０２）。ルール照合部２０３
は、区切りルール記憶部２０４から１つずつルールを取
り出し（ステップ５０３）、入力文の照合位置でそのル
ールと合致するか否かをチェックする（ステップ５０
４）。すなわち、ルール照合部２０３の動作は、単語分
割部２０１から出力された文字列（単語に分割されてい
る）に対して、先頭単語から末尾まで順に区切りルール
記憶部２０４から順にルールを取り出し、入力文の照合
位置でルールと合致するか否かをチェックする。パター
ンにマッチしたならば（ステップ５０６）、次の区切り
位置が掲載の末尾であるか否かを判断し（ステップ５０
７）、末尾であれば、区切りを入れずに、次のルールを
取り出す（ステップ５０８）。また、区切り位置が掲載
の末尾でなければ、指定された位置で区切る（ステップ
５０５）。例えば、『多摩農協武蔵野支所』という入力
に対して単語分割の結果、『多摩／農協／武蔵野／支
所』となる。この結果に対して、先ず先頭の単語『多
摩』で一致するルールは無いため、次の単語『農協』で
一致するルールを探す。その結果、末尾語で一致するル
ール中の『農協』で区切るルールに一致するので、区切
り付与部では『多摩農協／武蔵野支所』と組織の区切り
を付与する（ステップ５０５）。ただし、照合した位置
が末尾の場合には、そこで区切りを入れても無意味であ
るため、次のルールに進む。例えば、『多摩農協』の場
合には、『農協』の後に区切りを入れても無意味である
ため、ここでは区切りを入れずに次のルールに進む。[0007] As shown in FIG.
Divides the input character string into words and assigns meanings to the words (step 501). Next, the rule matching unit 203 repeats the following processing while shifting the rule matching position from the beginning to the end of the character string before division (step 502). Rule matching unit 203
Retrieves rules one by one from the delimiter rule storage unit 204 (step 503), and checks whether or not the input sentence matches the rule at the collation position (step 50).
4). In other words, the operation of the rule matching unit 203 is as follows. For the character string (divided into words) output from the word division unit 201, rules are sequentially extracted from the division rule storage unit 204 from the first word to the end, and Check if the rule matches the sentence matching position. If the pattern matches (step 506), it is determined whether the next break position is at the end of the publication (step 50).
7) If it is at the end, the next rule is taken out without a break (step 508). If the delimiter position is not the end of the publication, it is delimited at the designated position (step 505). For example, as a result of word division for an input of "Tama agricultural cooperative Musashino branch", the result is "Tama / Agricultural cooperative / Musashino / branch". For this result, since there is no rule that matches with the first word “Tama”, a rule that matches with the next word “Nokyo” is searched. As a result, because the rule matches with the rule that is delimited by “Agricultural Cooperative” in the rule that matches by the last word, the delimiter is provided with a delimiter of “Tama agricultural cooperative / Musashino branch” (step 505). However, if the collated position is at the end, it is meaningless to insert a break there, so the process proceeds to the next rule. For example, in the case of “Tama agricultural cooperative”, it is meaningless to put a break after “no agricultural cooperative”, so the process proceeds to the next rule without a break here.

【０００８】図３は、本発明の第２の実施例を示す企業
名解析装置のブロック図であり、図６は、図３における
企業名解析方法を示す動作フローチャートである。な
お、図７は、ルールに使用できる文字を示すテーブルの
図であり、図８は、半角英数字の表す意味を示すテーブ
ルの図である。第２の実施例では、図３に示すように、
第１の実施例の構成に『ルール解釈部３０１』を付加す
る。すなわち、ルール照合部２０３で照合するために、
区切りルール記憶部２０４からルールを取り出し、その
ルールをルール解釈部３０１で解釈することによりルー
ル照合部２０３の処理を容易にする。第２の実施例の動
作フローは、第１の実施例と同じ図５のフローと同じで
あって、入力文の照合位置でルールと合致するか否かを
チェックする動作（ステップ５０４）が詳細になってい
る点で異なるが、それ以下のステップ５０５〜５０８は
図５と変わらない。FIG. 3 is a block diagram of a company name analyzing apparatus showing a second embodiment of the present invention, and FIG. 6 is an operation flowchart showing a company name analyzing method in FIG. FIG. 7 is a diagram of a table showing characters that can be used in the rules, and FIG. 8 is a diagram of a table showing the meaning of half-width alphanumeric characters. In the second embodiment, as shown in FIG.
A "rule interpreting unit 301" is added to the configuration of the first embodiment. That is, in order for the rule matching unit 203 to perform matching,
A rule is extracted from the delimiter rule storage unit 204, and the rule interpreting unit 301 interprets the rule, thereby facilitating the process of the rule matching unit 203. The operation flow of the second embodiment is the same as the flow of FIG. 5, which is the same as that of the first embodiment. The operation (step 504) of checking whether or not the input sentence matches the rule at the matching position is detailed. However, steps 505 to 508 below this are the same as those in FIG.

【０００９】区切りルール記憶部２０４からルールを１
つずつ取り出し、照合位置でルールと合致するか否かを
判断する場合、先ず、ルール記憶部２０４から選択した
１つのルールに対して、先頭から１項目を取り出す（ス
テップ６０１）。ここでは、項目の区切りは半角の文字
であるとする。半角英数字の意味については、図８に示
すように、意味記号１の場合には、姓・企業名（例え
ば、鈴木）、意味記号２の場合には、名（例えば、
実）、意味記号３の場合には、組織、役職（例えば、課
長、官舎）、をそれぞれ表わしている。図８の項目選択
６０１の行先に示すように、取り出した項目が全角文字
列の場合には、照合位置の単語表記と比較して一致すれ
ば次へ進む（ステップ６０２）。一方、取り出した項目
が半角数字・英大文字の場合には、照合位置の単語の意
味と比較して、一致すれば次へ進む（ステップ６０
３）。記号や半角英小文字の時には、図６のステップ６
０４〜６０７の処理を行う。例えば、大括弧の開始記号
“［”が現われたときには、大括弧の終了記号“］”ま
でを照合する単語のリストとして記憶する（ステップ６
０５，６０６）。また、ピリオド“．”の場合には、照
合位置を次の単語にする。また、アスタリスク記号
“＊”の場合には、次のルール項目と一致するまで照合
位置をずらす（ステップ６０４）。また、小英文字
“ｘ”の場合には、次に現れる単語や単語リストと一致
しないことを照合の条件とする。すなわち、照合条件を
不一致にする（ステップ６０７）。最後に、照合結果を
呼び出し元に返して、次の処理に進める。The rule is stored in the delimiter rule storage unit 204 as 1
When it is determined whether the rule matches the rule at the collation position, first, one item is extracted from the top of one rule selected from the rule storage unit 204 (step 601). Here, it is assumed that an item delimiter is a single-byte character. As shown in FIG. 8, the meaning of the half-width alphanumeric characters is as follows: in the case of the meaning symbol 1, the surname / company name (for example, Suzuki), and in the case of the meaning symbol 2, the name (for example,
In fact, the meaning symbol 3 represents an organization and a post (for example, section manager, government building). As shown in the destination of the item selection 601 in FIG. 8, when the extracted item is a full-width character string, it is compared with the word notation at the collation position and proceeds to the next step (step 602). On the other hand, if the extracted item is a half-width numeral / upper case, it is compared with the meaning of the word at the collation position, and if they match, the process proceeds to the next step (step 60).
3). For symbols and single-byte lowercase letters, step 6 in FIG.
Steps 04 to 607 are performed. For example, when the opening symbol "[" appears in the brackets, the words up to the closing symbol "]" in the brackets are stored as a list of words to be matched (step 6).
05,606). In the case of a period ".", The matching position is set to the next word. If the symbol is an asterisk "*", the matching position is shifted until the next rule item is matched (step 604). In addition, in the case of the small English character “x”, the matching condition is that it does not match the next appearing word or word list. That is, the collation conditions are set to disagree (step 607). Finally, the collation result is returned to the caller, and the process proceeds to the next process.

【００１０】図７に示すように、ルールに記述できる文
字は、漢字文字列（例えば、ＮＴＴ）、％の記号（組織
区切り）（例えば、ＮＴＴ％渋谷／支店）、半角英数字
（単語の意味）（例えば、9支役所)、ピリオド“．”
（任意の１単語）（例えば、ＮＴＴ％．支店）、アスタ
リスク“＊”（任意の０単語以上）（ＮＴＴ％＊支
店）、小英文字“ｘ”（次に記述された単語（リスト）
が来てはならない（例えば、ＮＴＴ％ｘソフトウェ
ア）、“［”“］”（単語のリスト）（例えば、ＮＴＴ
％ｘ［ソフトウェア，ドコモ，パーソナル］）、“，”
（単語リストの区切り）、“／”（単語区切り）（例え
ば、ＮＴＴ／ソフトウェア）、等である。As shown in FIG. 7, characters that can be described in a rule include a kanji character string (eg, NTT), a% symbol (organization separator) (eg, NTT% Shibuya / branch), half-width alphanumeric characters (word meaning) ) (For example, 9 branch offices), period "."
(Arbitrary one word) (for example, NTT%. Branch), asterisk “*” (arbitrary 0 or more words) (NTT% * branch), small English character “x” (word (list) described next)
Must not come (e.g., NTT% x software), "[""]" (list of words) (e.g., NTT
% X [software, docomo, personal]), ","
(Word list break), "/" (word break) (for example, NTT / software), and the like.

【００１１】図４は、本発明の第３の実施例を示す企業
名解析装置のブロック図であり、図９は、図４における
企業名解析方法の動作フローチャートである。第３の実
施例の構成は、図４に示すように、ルール解釈部３０１
にルール選択部４０１を接続し、ルール選択部４０１に
区切りルール記憶部２０４を接続する。区切りルール記
憶部２０４内には、先頭から照合するルール４０２と、
任意の位置から参照するルール４０３と、下部組織から
主名義を区切るルール４０４と、下部組織を区切るルー
ル４０５とが設けられている。図９のフローは、図５に
示すフローのうちの区切りルール記憶部２０４から１つ
ずつ取り出すステップ５０３を詳細に示したもので、そ
れ以外のステップは図５と同じである。区切りルール記
憶部２０４からルールを取り出す際に、先頭から照合す
るルール群を選択するか（ステップ９０２）、法人種別
で区切るルール群を選択するか（ステップ９０３）、任
意の位置の末尾語で区切るルール群を選択するか（ステ
ップ９０４）、下部組織の前で主名義の区切りを入れる
ルール群を選択するか（ステップ９０５）、下部組織を
区切るためのルール群を選択するか（ステップ９０
６）、いずれか１つを選択する。照合処理は、分類した
ルールを順に適用する。FIG. 4 is a block diagram of a company name analyzing apparatus showing a third embodiment of the present invention, and FIG. 9 is an operation flowchart of the company name analyzing method in FIG. The configuration of the third embodiment is similar to that of FIG.
Is connected to the rule selection unit 401, and the delimiter rule storage unit 204 is connected to the rule selection unit 401. In the delimiter rule storage unit 204, a rule 402 to be collated from the beginning,
A rule 403 for referring from an arbitrary position, a rule 404 for separating the main name from the lower organization, and a rule 405 for separating the lower organization are provided. The flow of FIG. 9 shows in detail the step 503 of extracting one from the delimiter rule storage unit 204 in the flow shown in FIG. 5, and the other steps are the same as those of FIG. When extracting rules from the delimiter rule storage unit 204, a rule group to be collated from the beginning is selected (step 902), a rule group delimited by corporation type is selected (step 903), or a rule group is delimited by an end word at an arbitrary position Whether to select a rule group (step 904), to select a rule group to put a main name in front of the subordinate organization (step 905), or to select a rule group to separate the subordinate organization (step 90)
6), select one of them. The collation process applies the classified rules in order.

【００１２】図１０は、本発明における区切りルールの
例を示す説明図である。先頭からマッチングを試みる種
別では、登録時には例えば、９支役所％ｘ前、ＮＴＴ
％．支店と登録し、解析時には例えば、武蔵野市役所％
総務課、ＮＴＴ％吉祥寺支店と区切られる。例えば、コ
ンビニ等のチェーン店を正確に区切るためには、先頭か
ら照合するルールで区切りの精度を向上することができ
る。次に、法人種別では、登録時には例えば、Ｃ％と登
録し、解析時には例えば、日本電信電話（株）％総務部
と区切られる。（株）（有）等の法人種別を表す記号
は、ほぼ確実に組織の区切りを表しているため、法人種
別に対する区切りルールを別に設けている。次に、末尾
語でマッチングを試みる種別では、登録時には例えば、
大学，病院，農協と登録し、解析時には例えば、東京大
学％理学部、武蔵載病院％内科、多摩農協％武蔵野支所
と区切られる。次に、下部組織前で主名義を区切る種別
では、登録時には例えば、％９店と登録し、解析時には
例えば、ＡＢＣ％渋谷店と区切られる。ＡＢＣ渋谷店の
ように、主名義『ＡＢＣ』には組織を区切る手掛かりが
ない。このような場合でも、『％９店』と登録すれば地
名＋店の前に区切りを入れることができる。次に、下部
組織を区切るルール種別では、登録時には例えば、部と
登録し、解析時には例えば、ＮＴＴ％総務部％総務課と
区切られる。下部組織を解析し、その直前を主名義の区
切りとするルールにより、主名義に区切りの手掛かりが
無いときも区切ることができる。例えば、『Ｎ／自動車
／横浜』と登録すれば、『Ｎ自動車横浜磯子店』のよう
に地名の間に組織の区切りがある場合にも、正確に区切
ることができる。FIG. 10 is an explanatory diagram showing an example of a break rule in the present invention. For the type that attempts to match from the beginning, at the time of registration, for example, 9 branch office% x before, NTT
%. Register as a branch, and at the time of analysis, for example, Musashino City Hall%
It is separated from the General Affairs Division and the NTT% Kichijoji Branch. For example, in order to accurately delimit a chain store such as a convenience store, the precision of delimitation can be improved by a rule for collation from the beginning. Next, in the corporation type, for example, C% is registered at the time of registration, and, for example, at the time of analysis, it is separated from Nippon Telegraph and Telephone Corporation% General Affairs Department. A symbol indicating a corporation type such as (Shares) Ltd. almost certainly represents a division of an organization, and therefore a separate rule for the corporation type is separately provided. Next, for the type that attempts to match with the last word, for example, at the time of registration,
Registered as a university, hospital, or agricultural cooperative, it is divided into, for example, the University of Tokyo, Faculty of Science, Musashikan Hospital, Internal Medicine, and Tama Agricultural Cooperative, Musashino Branch. Next, for the type that separates the main name in front of the lower organization, for example, at the time of registration, it is registered as, for example,% 9 stores, and at the time of analysis, for example, it is separated as, for example, ABC% Shibuya store. Unlike the ABC Shibuya store, the main name "ABC" has no clue to separate the organization. Even in such a case, if "% 9 stores" is registered, a separator can be inserted before the place name + store. Next, in the rule type for dividing the subordinate organization, for example, it is registered as a department at the time of registration, and is separated from, for example, an NTT% general affairs department% general affairs section at the time of analysis. By analyzing the subordinate organization and using the rule immediately before the main name as a delimiter, it is possible to separate even when there is no clue to the delimitation in the main name. For example, if "N / automobile / Yokohama" is registered, even if there is a division between place names such as "N automobile Yokohama Isogo store", it is possible to accurately separate.

【００１３】[0013]

【発明の効果】以上説明したように、本発明によれば、
区切りルールをテーブル化して、そこに意味、単語、否
定、単語リスト等を記述できるようにしたので、場合に
応じて最適な表現を記述することができ、かつテーブル
化によりルールの変更が容易にできる。また、先頭から
マッチングするルールや、下部組織から主名義の区切り
を判断するルールを設けたので、企業名の末尾語を含ま
ない場合でも簡単に区切ることができる。As described above, according to the present invention,
Separation rules are tabulated so that meanings, words, negations, word lists, etc. can be described, so that optimal expressions can be described according to the case, and rules can be easily changed by tabulation. it can. In addition, since a rule for matching from the beginning and a rule for judging the delimitation of the main name from the lower organization are provided, division can be easily performed even when the last word of the company name is not included.

[Brief description of the drawings]

【図１】本発明の企業名解析方法の動作原理を示すフロ
ーチャートである。FIG. 1 is a flowchart showing the operation principle of the company name analysis method of the present invention.

【図２】本発明の第１の実施例を示す企業名解析装置の
ブロック図である。FIG. 2 is a block diagram of a company name analyzer according to the first embodiment of the present invention.

【図３】本発明の第２の実施例を示す企業名解析装置の
ブロック図である。FIG. 3 is a block diagram of a company name analyzer according to a second embodiment of the present invention.

【図４】本発明の第３の実施例を示す企業名解析装置の
ブロック図である。FIG. 4 is a block diagram of a company name analyzing apparatus showing a third embodiment of the present invention.

【図５】図２における第１の実施例の企業名解析方法を
示す動作フローチャートである。FIG. 5 is an operation flowchart showing a company name analyzing method according to the first embodiment in FIG. 2;

【図６】図３における第２の実施例の企業名解析方法を
示す動作フローチャートである。FIG. 6 is an operation flowchart showing a company name analyzing method according to the second embodiment in FIG. 3;

【図７】本発明において、ルールに記述できる文字を示
す図である。FIG. 7 is a diagram showing characters that can be described in a rule in the present invention.

【図８】本発明における半角英数字の意味を示す図であ
る。FIG. 8 is a diagram showing the meaning of half-width alphanumeric characters in the present invention.

【図９】図４における第３の実施例の企業名解析方法を
示す動作フローチャートである。FIG. 9 is an operation flowchart illustrating a company name analyzing method according to a third embodiment in FIG. 4;

【図１０】図９における各ルールの例を示す説明図であ
る。FIG. 10 is an explanatory diagram showing an example of each rule in FIG. 9;

[Explanation of symbols]

２０１…単語分割部、２０２…単語辞書、２０３…ルー
ル照合部、２０４…区切りルール記憶部、２０５…区切
り付与部、３０１…ルール解釈部、４０１…ルール選択部、４０２…先頭から照合するルー
ル、４０３…任意の位置から参照するルール、４０４…
下部組織から主名義を区切るルール、４０５…下部組織
を区切るルール。201: word division unit, 202: word dictionary, 203: rule collation unit, 204: delimiter rule storage unit, 205: delimiter assignment unit, 301: rule interpretation unit, 401: rule selection unit, 402: rule to be collated from the beginning, 403: rule to be referenced from an arbitrary position, 404:
Rule for separating main name from subordinate organization, 405 ... Rule for separating subordinate organization.

Claims

[Claims]

1. A company name analysis apparatus for giving a division of an organization to a company name inputted without a delimiter, wherein the input company name is converted into a word using a word dictionary in which the words constituting the company name and the meaning are registered. A word segmentation unit that assigns the meaning of the word; a segmentation rule storage unit that stores a rule that assigns an organization segmentation; and a rule matching unit that matches the company name segmented into words with the segmentation rule. And a delimiter for assigning a delimiter of an organization to a delimiter position analyzed by the rule collating unit.

2. A rule interpreting unit for interpreting a rule described by combining notations of the word itself, meanings of the word, negation symbols, delimiters, and symbols such as symbols representing arbitrary words. 2. The company name analysis device according to claim 1, wherein:

3. The rule storage unit stores a rule that refers to a company name from the beginning, a rule that defines a word that should come to the end of a main name, a rule that defines a main name by a word that indicates a subordinate organization, and a rule that defines a subordinate organization. 3. The company name analyzing device according to claim 1, wherein the device is classified into rules and stored, and the partitioning unit includes a rule selecting unit that sequentially applies the classified rules. 4.

4. A company name analysis method for giving an organization break to a company name input without delimiters, wherein the input company name is converted to a word using a word dictionary in which words and meanings constituting the company name are registered. A first step of assigning the meaning of the word; and a collation rule for assigning a division between the company name and the organization divided into words in the first step, and analyzing the position of the division. 2. A company name analysis method, comprising: a second step; and a third step of giving a division of an organization to a division position analyzed in the second step.