JPH08212301A

JPH08212301A - Device and method for segmenting character

Info

Publication number: JPH08212301A
Application number: JP7019245A
Authority: JP
Inventors: Hideto Yamamoto; 英人山本; Masayoshi Okamoto; 正義岡本
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1995-02-07
Filing date: 1995-02-07
Publication date: 1996-08-20

Abstract

PURPOSE: To precisely segment characters even when a character string of input characters which are not fixed in size or pitch is inputted by judging the kind of the inputted character set and changing segmentation parameters on the basis of the judged kind. CONSTITUTION: A user registers the kind of inputted characters by indicating the icon corresponding to the character kind with a pen. At a segmentation part 7, a character kind judging part 51 judges what character kind the input character string has. At a judging part 72' a criterion change 52 changes the segmentation parameters according to the character kind judged by the character kind judging part 51. A judging processing part 53 judges a character gap and a non-character gap. At an optimum path detection part 73', a segmentation feature evaluation part 54 changes the parameters according to the character kind judged by the character kind judging part 51. A segmentation position judging part 55 determines a segmentation position.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】本発明は、複数の文字を単位毎に切り出す
ための切り出し技術に関する。特に、タブレット等から
手書き入力された筆記データ又は、ＯＣＲ等により入力
された光学データ等の、文字データを１文字毎に切り出
すための切り出し位置（文字間位置）を検出する技術に
関する。The present invention relates to a clipping technique for clipping a plurality of characters in units. In particular, the present invention relates to a technique for detecting a cutout position (position between characters) for cutting out character data for each character, such as handwritten data input from a tablet or the like, or optical data input by OCR or the like.

【０００２】[0002]

【従来の技術】手書き入力された筆記データを文字認識
して、この筆記データを文字データに変換して、記憶す
る電子メモ帳等が開発されている。そして、この電子メ
モ帳では、１文字用の筆記枠を複数設けて、この筆記枠
内に１文字ずつ手書き入力する。そして、この１文字毎
の筆記データから文字を認識している。2. Description of the Related Art An electronic notepad or the like has been developed which recognizes handwritten input handwritten data as characters, converts the handwritten data into character data, and stores the character data. In this electronic memo pad, a plurality of writing frames for one character are provided, and one character is handwritten and input into the writing frame. Then, the character is recognized from the writing data for each character.

【０００３】また、より自由度の高い手書き入力を実現
するためには、一文字づつの枠が存在しない領域に連続
して筆記することも考えられている。しかし、この後者
の場合は、連続して筆記された筆記データを一文字毎に
切り出す処理が必要となる。この切り出し位置は、当
然、文字間である。このため、連続して筆記された筆記
データより、文字間を自動的に検出しなくてはならず、
このための技術が種々提案されている。Further, in order to realize handwriting input with a higher degree of freedom, it has been considered to continuously write in a region where there is no frame for each character. However, in the latter case, it is necessary to perform a process of cutting out handwritten data continuously written for each character. This cut-out position is naturally between characters. For this reason, it is necessary to automatically detect the space between characters from the written data that is written continuously.
Various techniques for this have been proposed.

【０００４】つまり、個々の筆記データ（ストローク）
の間隔により、文字切り出し位置を判断することが考え
られる。また、ペンが筆記面から離れている時間（ペン
ＯＦＦ時間）により、文字切り出し位置を検出すること
も考えられる。また、切り出された文字列を文字認識し
た場合の評価値（文字認識信頼度）により、正しい文字
切り出し位置を判定することも考えられる。That is, individual writing data (stroke)
It is conceivable to judge the character cut-out position based on the interval. It is also conceivable to detect the character cut-out position by the time the pen is away from the writing surface (pen OFF time). It is also possible to determine the correct character cut-out position based on the evaluation value (character recognition reliability) when the cut-out character string is character-recognized.

【０００５】更に、この文字認識信頼度だけでなく、こ
の文字認識結果を言語処理で得られた評価値（言語処理
信頼度）により、正しい文字切り出し位置を判定するこ
とも考えられる。また、上記判定手段を組み合わせるこ
とも考えられる。尚、このようなことは、特開平６−１
２４３６４号公報（G06K9/34)、特開平６−１６２２６
９号公報（G06K9/62)等にも示され公知である。Further, not only the character recognition reliability but also the evaluation value (language processing reliability) obtained by the language processing of the character recognition result may be used to determine the correct character cut-out position. It is also possible to combine the above determination means. In addition, such a thing is disclosed in Japanese Patent Laid-Open No. 6-1
No. 24364 (G06K9 / 34), JP-A-6-16226.
It is publicly known as shown in Japanese Patent Publication No. 9 (G06K9 / 62) and the like.

【０００６】このように従来の技術については、良く知
られているが、図１〜図３を参照しつつ、簡単に従来の
一例を説明する。図１は、文字切出装置の概略構成を示
すブロック図である。同図において、１は入力部であ
る。この入力部１は、電磁誘導型のタブレットからな
る。電磁誘導型のタブレットであるので、電磁誘導型ペ
ンのタブレット面へのタッチ以外に、ペンの離間状態も
知ることができる。この入力部１は、一定時間間隔でペ
ンの座標（ｘ，ｙ）と、ペンの状態（ペンのＦＡＲ、ペ
ンのＯＦＦ，ペンＯＮ）を検出し、そのデータを出力し
ている。ペンＦＡＲ状態とはペン先が筆記面から大きく
離れている状態であり、ペンＯＦＦ状態とはペン先が筆
記面から離れている状態であり、ペンＯＮ状態とはペン
先が筆記面に接触している状態である。As described above, although the conventional technique is well known, an example of the conventional technique will be briefly described with reference to FIGS. 1 to 3. FIG. 1 is a block diagram showing a schematic configuration of a character cutting device. In the figure, 1 is an input unit. The input unit 1 is composed of an electromagnetic induction type tablet. Since the tablet is an electromagnetic induction type, it is possible to know not only the touch of the tablet surface of the electromagnetic induction type pen but also the separated state of the pen. The input unit 1 detects the coordinates (x, y) of the pen and the state of the pen (FAR of the pen, OFF of the pen, ON of the pen) at constant time intervals, and outputs the data. The pen FAR state is a state in which the pen tip is far away from the writing surface, the pen OFF state is a state in which the pen tip is far from the writing surface, and the pen ON state is a state in which the pen tip is in contact with the writing surface. It is in a state of being.

【０００７】２は、装置全体を制御する演算処理用の中
央処理装置（ＣＰＵ）を示している。３は、ＲＯＭであ
る。ＲＯＭ３は、ＣＰＵ２の制御プログラムや各種デー
タを格納している。４は、メモリ（ＲＡＭ）である。Ｒ
ＡＭ４は、ＣＰＵ２のワークエリアとして使用されると
共に、入力部１から入力された筆跡データを記憶する領
域、切り出し部７の結果である文字情報を記憶する領
域、文書情報を記憶する領域とを備えている。Reference numeral 2 denotes a central processing unit (CPU) for arithmetic processing which controls the entire apparatus. 3 is a ROM. The ROM 3 stores the control program of the CPU 2 and various data. Reference numeral 4 is a memory (RAM). R
The AM 4 is used as a work area for the CPU 2, and has an area for storing handwriting data input from the input unit 1, an area for storing character information as a result of the clipping unit 7, and an area for storing document information. ing.

【０００８】７は、切り出し部である。この切り出し部
７はストローク列抽出部７１と、判定部７２と、最適経
路検出部７３からなる。ストローク列抽出部７１は、ス
トローク列を求める。判定部７２では、まず、各ストロ
ーク間が文字間か非文字間か判定不能かを判定する。ス
トローク列抽出部７１は入力部１からのデータにより、
筆記データのペン状態に基づいて、ストローク毎に座標
データを分割管理すると共に、各ストローク間隔、ペン
ＯＦＦ時間など切り出しに必要なデータを算出してい
る。Reference numeral 7 is a cutout portion. The cutout unit 7 includes a stroke sequence extraction unit 71, a determination unit 72, and an optimum route detection unit 73. The stroke sequence extraction unit 71 obtains a stroke sequence. The determination unit 72 first determines whether it is impossible to determine whether each stroke is between characters or non-characters. The stroke sequence extraction unit 71 uses the data from the input unit 1 to
Based on the pen state of the writing data, the coordinate data is divided and managed for each stroke, and the data necessary for cutting out, such as each stroke interval and pen OFF time, is calculated.

【０００９】判定部７２は、各ストローク間が文字間か
非文字間か判定不能かを判定する。この判定方法につい
て、一例を説明する。この従来例での判定方法は、ＯＦ
Ｆ時間、スペース幅、ＦＡＲ時間（ペン位置が感知でき
ないほど筆記面から離れた時間）を用いている。ここで
は、ＯＦＦ時間（ペンの離れが約１ｃｍ以下の時間）
が、文字間確定ＯＦＦ時間閾値（約０．５ｓｅｃ）を越
えた場合は、文字間であるとして、文字間を表す状態値
を「１」に設定する。The determination unit 72 determines whether it is impossible to determine whether the strokes are between characters or non-characters. An example of this determination method will be described. The determination method in this conventional example is OF
The F time, the space width, and the FAR time (the time away from the writing surface so that the pen position cannot be detected) are used. Here, the OFF time (the time when the pen separation is about 1 cm or less)
However, if it exceeds the character-to-character fixed OFF time threshold value (about 0.5 sec), it is determined that there is character space, and the state value indicating the character space is set to "1".

【００１０】又、ＦＡＲ時間（ペンの離れが約１ｃｍ以
上の時間）が、文字間確定ＦＡＲ時間閾値（約０．０ｓ
ｅｃ）を越えた場合、文字間であるとして、状態値を
「１」に設定する。また、スペース幅（Ｘ方向）が、文
字間確定スペース幅閾値（約３．０ｍｍ）を越えた場合
は、文字間であるとして、状態値を「１」に設定する。Further, the FAR time (the time when the pen is separated by about 1 cm or more) is determined by the inter-character fixed FAR time threshold (about 0.0 s).
If it exceeds ec), it is determined that there is a space between characters and the state value is set to "1". When the space width (X direction) exceeds the inter-character fixed space width threshold (about 3.0 mm), it is determined that the space is between characters and the state value is set to "1".

【００１１】一方、ＯＦＦ時間が、非文字間確定ＯＦＦ
時間（約０．１ｓｅｃ）より小さい場合は、非文字間で
あるとして、状態値を「０」に設定する。又、スペース
幅（Ｘ方向）が、非文字間確定スペース幅閾値（約−
１．０ｍｍ）より小さい場合は状態値を「０」に設定す
る。これ以外の場合は、判定不能として、状態値を
「２」に設定する。On the other hand, the OFF time is a non-character fixed OFF
If it is smaller than the time (about 0.1 sec), it is determined that there is a space between characters and the state value is set to "0". In addition, the space width (X direction) is defined as the non-character space width threshold (about −
If it is smaller than 1.0 mm), the state value is set to "0". In other cases, the state value is set to "2" because the determination is impossible.

【００１２】図２に、この筆記の一例を示す。つまり、
図２のＥ１に示すように「今日が」と筆記すると、Ｅ３
に示されるように、ストローク列に変換される。そし
て、Ｅ４に示されるように、各ストローク間における文
字間・非文字間が判定されたとする。この状態値「１」
のストローク間は必ず文字間であり、状態値「０」のス
トローク間は必ず非文字間であり、状態値「２」のスト
ローク間は、どちらにもなりうる。この場合、このスト
ローク列を細かく切り出す場合は、図２のＥ５に示す様
な基本セグメントに分割できる。Ｅ６に示す様に、この
時のストローク集合は、９個存在する。FIG. 2 shows an example of this writing. That is,
As shown by E1 in FIG. 2, if "Kyoga ga" is written, E3
Is converted into a stroke sequence. Then, as indicated by E4, it is assumed that the character spacing / non-character spacing between strokes is determined. This status value is "1"
Between strokes is always between characters, strokes with a state value of "0" are always between characters, and strokes with a state value of "2" can be either. In this case, when the stroke sequence is cut out finely, it can be divided into basic segments as shown by E5 in FIG. As shown by E6, there are nine stroke sets at this time.

【００１３】そして、この複数のストローク集合の組み
合わせ（例えば、１−３−４−６−７−９，１−３−４
−６−８など）は、８通り考えられる。最適経路検出部
７３は、このストローク集合の組み合わせのうち可能性
の高いものを検出するものである。つまり、このストロ
ーク集合の相互の関連を図示すると図３のようなネット
ワークで表現できる。A combination of a plurality of stroke sets (for example, 1-3-4-6-7-9, 1-3-4).
-6-8) can be considered in eight ways. The optimum route detection unit 73 detects the most likely combination of stroke sets. That is, the mutual relation of the stroke sets can be represented by a network as shown in FIG.

【００１４】図３において、ネットワークのノードは、
図２に示す９個のストローク集合Ｅ６と同じ番号同士が
対応している。また、隣接するノード（ストローク集
合）は文字間としてリンク接続されている。そして、ノ
ードの重みは一文字らしさを表し、リンクの重みは文字
間らしさを表している。そして、ノード、リンク毎に、
「一文字らしさ」と「文字らしさ」を表す特徴量を算出
して、経路の総和を求めることによって、最適な経路を
求める。この場合は、例えば、２−５−８と、２−４−
６−８の２通りが高得点であったとする。In FIG. 3, the nodes of the network are
The same numbers correspond to the nine stroke sets E6 shown in FIG. Further, adjacent nodes (stroke sets) are linked as characters. Then, the weight of the node represents the likelihood of one character, and the weight of the link represents the likelihood of the character. And for each node and link,
The optimum route is obtained by calculating the feature amount representing “one character likeness” and “character likeness” and obtaining the total of the routes. In this case, for example, 2-5-8 and 2-4-
It is assumed that two cases of 6-8 have a high score.

【００１５】図１の５は、認識部である。この認識部５
は公知の文字認識プロセッサからなり、この認識部５に
入力されたストローク集合について、認識辞書１０を参
照して文字認識を行い、認識結果候補（文字コード）と
その得点を返す。この実施例では、前述の２−５−８の
組み合わせのストローク集合と、２−４−６−８の組み
合わせのストローク集合が入力され文字認識される。Reference numeral 5 in FIG. 1 is a recognition unit. This recognition unit 5
Is a well-known character recognition processor, performs character recognition on the stroke set input to the recognition unit 5 with reference to the recognition dictionary 10, and returns recognition result candidates (character codes) and their scores. In this embodiment, the stroke set of the combination 2-5-8 and the stroke set of the combination 2-4-6-8 are input and the characters are recognized.

【００１６】６は言語処理部である。この認識部６は公
知の言語処理プロセッサからなり、認識部５で得られた
文字認識結果候補を組み合わせて得られる文字列から単
語（文字コード列）とその得点を返す。８は、文字列認
識部である。この文字列認識部８は、最適経路検出部１
５での得点、認識部５での得点、言語処理部６での得点
を総合的に判定して、正しい文字列を判定する。Reference numeral 6 is a language processing unit. The recognition unit 6 is composed of a known language processor, and returns a word (character code string) and its score from a character string obtained by combining the character recognition result candidates obtained by the recognition unit 5. Reference numeral 8 is a character string recognition unit. The character string recognizing unit 8 is the optimum route detecting unit 1.
The score of 5, the score of the recognition unit 5, and the score of the language processing unit 6 are comprehensively determined to determine the correct character string.

【００１７】９は、表示部である。この表示部９は、デ
ィスプレイからなり、入力部１から入力された筆跡デー
タなどを表示する。尚、この例では、入力部１と表示部
９に分けて表しているが、表示一体型タブレットの様な
一体型でもよい。１２は内部バスである。この内部バス
１２は、ＣＰＵ２からのデータバス、アドレスバス及び
制御信号バスなどを含んでいる。Reference numeral 9 is a display section. The display unit 9 is a display, and displays handwriting data and the like input from the input unit 1. In this example, the input unit 1 and the display unit 9 are separately shown, but an integrated type such as a display-integrated tablet may be used. 12 is an internal bus. The internal bus 12 includes a data bus from the CPU 2, an address bus, a control signal bus, and the like.

【００１８】ところで、文字の切り出しにおいて、入力
される文字の標準的な基準書式データ（標準文字幅、標
準文字の縦横比、文字ピッチ、文字のスペース幅）等が
予かじめ分かっていると、判定部７２と最適経路検出部
７３での切り出しが容易に行える。つまり、標準的な文
字の書式データが図４（ａ）の場合、この書式データに
そって、切り出しが行われる。By the way, when the character is cut out, the standard reference format data (standard character width, standard character aspect ratio, character pitch, character space width) etc. of the input character are known in advance. The determination unit 72 and the optimum route detection unit 73 can easily cut out. That is, when the standard character format data is as shown in FIG. 4A, the clipping is performed according to this format data.

【００１９】例えば、最適経路検出部７３では、基準書
式データの文字の大きさに近い程、その基本セングメン
トの集合は「１文字らしい」と判断される。また、基準
書式データの文字のピッチに近い場所に存在する空白ス
ペースである程、「文字間らしい」と判断される。ま
た、空白スペースが文字のスペース幅より大きい程、
「文字間らしい」と判断される。For example, the optimum route detection unit 73 determines that the set of the basic segment is "likely one character" as it is closer to the character size of the reference format data. In addition, it is determined that "a space between characters" is determined as a blank space existing closer to the character pitch of the standard format data. Also, the larger the blank space is than the space width of the character,
It is judged that "it seems to be between characters".

【００２０】つまり、図４（ｂ）の入力の場合、図４
（ｃ）の如く、切り出される可能性が高い。また、標準
的な文字の書式データが図４（ｄ）の場合、図４（ｂ）
の入力は、図４（ｅ）の如く、切り出される可能性が高
い。入力文字の大きさが判っていれば、切り出しを誤る
可能性は少ない。That is, in the case of the input of FIG.
As in (c), there is a high possibility that it will be cut out. In addition, when the standard character format data is as shown in FIG.
Is likely to be cut out as shown in FIG. If the size of the input character is known, there is little possibility of erroneous clipping.

【００２１】そして、文字の大きさは、字種により異な
る。しかし、字種により、切り出し処理を可変すること
は、従来システムでは考慮されていなかった。尚、字種
により、切り出し処理ではなく、文字認識処理を可変す
ることは従来より行われている。また、尚、文字列中で
同じ字種の文字は連続する傾向があることは一般的に知
られている。これを言語処理に適用して、文字認識結果
候補の中で同一の字種の文字が連続するような結果を選
択出力することは行われている。しかし、入力文字列が
どのような字種の文字列で構成されるかを判定して、そ
の結果に応じて切り出し処理を変更することは行われて
いない。The size of the character differs depending on the character type. However, changing the clipping process depending on the character type has not been considered in the conventional system. Note that it has been conventionally performed to change the character recognition process instead of the cutout process depending on the character type. Further, it is generally known that characters of the same character type in a character string tend to be continuous. This is applied to language processing to selectively output a result in which characters of the same character type continue in character recognition result candidates. However, it is not performed to determine what kind of character string the input character string is composed of and to change the cutout process according to the result.

【００２２】[0022]

【発明が解決しようとする課題】複数の字種の文字が混
在した文字列等では、文字の大きさ、文字のスペース幅
が不揃いになり易く、切り出し精度の低下が起こりやす
い。このように、入力される文字の大きさが、標準的な
大きさから外れると、切り出し処理が誤ってしまう。ま
た、異なる大きさの文字を混在して入力されると、切り
出し処理が誤ってしまう。In a character string or the like in which characters of a plurality of character types are mixed, the size of characters and the space width of characters are likely to be uneven, and the cutout accuracy is likely to decrease. In this way, if the size of the input character deviates from the standard size, the clipping process will be incorrect. If characters of different sizes are mixed and input, the cutout process will be erroneous.

【００２３】本願の請求項１〜５，１０は、入力文字列
の字種構成を判定して、その判定結果に応じた文字切り
出し処理を行うことで文字切り出し精度を向上させるこ
とを目的とする。また、本願の請求項６〜９，１１は、
文字の大きさ、文字のスペース幅が不揃いな文字データ
について、文字切り出し精度を向上させることを目的と
する。It is an object of claims 1 to 5 and 10 of the present application to improve the character segmentation accuracy by determining the character type configuration of the input character string and performing the character segmentation processing according to the determination result. . Further, claims 6 to 9 and 11 of the present application,
An object of the present invention is to improve the character cutting accuracy for character data in which the character size and the character space width are not uniform.

【００２４】つまり、不揃いな文字データを切り出す場
合、基準となる標準文字幅は複数種類必要となる。しか
し、標準文字幅を複数種類設けて、切り出すことを実現
することは、困難である。ところで、標準文字ピッチ
は、当然、標準文字幅より小さく設定され、また、（標
準文字ピッチ−標準文字幅）は、標準の文字のスペース
幅となる。That is, when cutting out irregular character data, a plurality of standard character widths to be a reference are required. However, it is difficult to cut out by providing a plurality of standard character widths. By the way, the standard character pitch is naturally set smaller than the standard character width, and (standard character pitch-standard character width) is the standard character space width.

【００２５】このように、各種文字の書式データには関
連があり、かつ、このデータにより、文字間らしさと、
１文字らしさを評価して、切り出し位置を検出してい
た。しかし、これは、入力される文字の大きさが揃って
いる場合である。本願の請求項６〜９，１１，１２は、
文字の大きさ、文字のスペース幅が不揃いな文字データ
について、文字切り出し精度を向上させるために、従
来、関連した文字の書式データににより、関連した切り
出し処理を行っていた、文字間らしさと、１文字らしさ
の評価を個別の処理として、不揃いな文字データに対応
するものである。As described above, the format data of various characters are related to each other, and by this data, character-likeness and
The one-character likeness was evaluated to detect the cutout position. However, this is the case when the sizes of the input characters are uniform. Claims 6 to 9, 11, and 12 of the present application,
For character data with uneven character size and character space width, in order to improve the character cutout accuracy, the related cutout process was performed by the related character format data. The evaluation of 1 character likeness is performed as an individual process to deal with uneven character data.

【００２６】[0026]

【課題を解決するための手段】本願の文字切出装置は、
文字データを入力する入力手段(1)と、該入力手段(1)で
入力された文字データから文字を切り出す文字切出装置
において、入力された文字集合の字種を判定する字種判
定手段(51,51’)と、該字種判定手段(51,51’)で判定さ
れた字種に基づいて切り出しパラメータを変更する変更
手段(52,54)とを備えることを特徴とする。The character cutting device of the present application is
An input means (1) for inputting character data, and a character cutout device for cutting out a character from the character data input by the input means (1), a character type determination means for determining a character type of an input character set ( 51, 51 ') and changing means (52, 54) for changing the cutout parameter based on the character type determined by the character type determining means (51, 51').

【００２７】本願の文字切出装置は、文字データを入力
する入力手段(1)と、該入力手段(1)で入力された文字デ
ータから文字を切り出す文字切出装置において、入力さ
れた文字集合の字種を判定する字種判定手段(51,51’)
と、該字種判定手段(51,51’)で判定された字種に基づ
いて複数の切り出し方法を選択的に用いる変更手段(52,
54)とを備えることを特徴とする。The character slicing device according to the present application includes an input means (1) for inputting character data, and a character slicing device for slicing characters from the character data input by the input means (1). Character type determination means (51,51 ')
And a changing means (52, 51) that selectively uses a plurality of cutting methods based on the character type determined by the character type determining means (51, 51 ′).
54) and are provided.

【００２８】更に、本願の文字切出装置は、前記字種判
定手段(51)における判定は、字種設定モードの設定に基
づいて行うことを特徴とする。又、本願の文字切出装置
は、前記字種判定手段(51’)における判定は、文字デー
タの画数あるいは形状に基づいて行うことを特徴とす
る。又、本願の文字切出装置は、前記字種判定手段にお
ける判定は、前記切り出された文字を認識する認識手段
(5、10)を設け、該認識手段(5、10)の結果を基に判定する
ことを特徴とする。Further, the character cutting device of the present invention is characterized in that the judgment in the character type judging means (51) is made based on the setting of the character type setting mode. Further, the character cutting device of the present application is characterized in that the judgment by the character type judging means (51 ') is made based on the number of strokes or the shape of the character data. Further, in the character cutout device of the present application, the judgment in the character type judgment means is a recognition means for recognizing the cut out character.
(5, 10) is provided, and determination is performed based on the result of the recognition means (5, 10).

【００２９】本願の文字切出装置は、文字データを入力
する入力手段(1)と、該入力手段(1)で入力された文字デ
ータから文字数を推定する文字数推定手段(57)と、該文
字数推定手段(57)で得られた文字数に基づいて切り出し
パラメータを変更する変更手段(54)と、該変更手段手段
(54)により変更された切り出しパラメータにより切り出
し位置を検出する切り出し位置検出手段(55)とを備える
ことを特徴とする。The character slicing device of the present application comprises an input means (1) for inputting character data, a character number estimating means (57) for estimating the number of characters from the character data input by the input means (1), and the number of characters. Change means (54) for changing the cutout parameter based on the number of characters obtained by the estimating means (57), and the changing means means
It is characterized by further comprising a cutting position detecting means (55) for detecting a cutting position based on the cutting parameter changed by (54).

【００３０】本願の文字切出装置は、文字データを入力
する入力手段(1)と、該入力手段(1)で入力された文字デ
ータから文字数を推定する文字数推定手段(57)と、該文
字数推定手段(57)で得られた文字数に基づいて切り出し
パラメータを変更する変更手段(54)と、該変更手段手段
(54)により変更された切り出しパラメータにより切り出
し位置を検出する切り出し位置検出手段(55)とこの切り
出し位置検出手段(55)で得られた結果の文字数を基に、
再度文字切り出し処理を行わせる再処理指示手段(58)と
を備えることを特徴とする。The character slicing device of the present application comprises an input means (1) for inputting character data, a character number estimating means (57) for estimating the number of characters from the character data input by the input means (1), and the number of characters. Change means (54) for changing the cutout parameter based on the number of characters obtained by the estimating means (57), and the changing means means
(54) based on the number of characters of the result obtained by the cutout position detection means (55) and this cutout position detection means (55) for detecting the cutout position by the cutout parameter changed by
Re-processing instruction means (58) for performing the character cutting processing again is provided.

【００３１】本願の文字切出装置は、文字データを入力
する入力手段(1)と、該入力手段(1)で入力された文字デ
ータから文字数を推定する文字数推定手段(57)と、該文
字数推定手段(57)で得られた文字数に基づいて切り出し
パラメータを変更する変更手段(54)と、該変更手段手段
(54)により変更された切り出しパラメータにより切り出
し位置を検出する切り出し位置検出手段(55)とこの切り
出し位置検出手段(55)で得られた結果の文字数と、前記
文字数推定手段(57)でで得られた結果の文字数を比較し
て、前者の文字数が後者の文字数より小さい時に、この
切り出し位置検出手段(55)で得られた結果の文字数を基
に、再度文字切り出し処理を行わせる再処理指示手段(5
8)とを備えることを特徴とする。The character slicing device of the present application comprises an input means (1) for inputting character data, a character number estimating means (57) for estimating the number of characters from the character data input by the input means (1), and the character number. Change means (54) for changing the cutout parameter based on the number of characters obtained by the estimating means (57), and the changing means means
(54) cut-out position detecting means for detecting the cut-out position by the cut-out parameter changed by (54), the number of characters of the result obtained by this cut-out position detecting means (55), and the character number estimating means (57) When the number of characters in the former is smaller than the number of characters in the latter, the reprocessing instruction to perform the character cutting process again based on the number of characters in the result obtained by this cutting position detection means (55). Means (5
8) and are provided.

【００３２】本願の文字切出装置は、文字データを入力
する入力手段(1)と、該入力手段(1)で入力された文字デ
ータから文字数を推定する文字数推定手段(57)と、該文
字数推定手段(57)で得られた文字数に基づいて切り出し
パラメータを変更する変更手段(54)と、該変更手段手段
(54)により変更された切り出しパラメータにより切り出
し位置の検出を少なくとも文字と文字間とに分けて評価
して、この別々に評価した総合により、切り出し位置を
検出する切り出し位置検出手段(55)とこの切り出し位置
検出手段(55)で得られた結果の文字数を基に、再度文字
切り出し処理を行わせる場合、前記切り出し位置検出手
段(55)での文字としての評価は前記文字数推定手段(57)
で得られた文字数に基づいて評価し、文字間としての評
価は前記切り出し位置決定手段(55)で得られた結果の文
字数を基にして評価させる文字切り出し処理を再度行わ
せる再処理指示手段(58)とを備えることを特徴とする。The character slicing device of the present application comprises an input means (1) for inputting character data, a character number estimating means (57) for estimating the number of characters from the character data input by the input means (1), and the number of characters. Change means (54) for changing the cutout parameter based on the number of characters obtained by the estimating means (57), and the changing means means
The detection of the cut-out position by the cut-out parameter changed by (54) is evaluated at least by dividing it into characters and between characters, and by this separately evaluated synthesis, the cut-out position detecting means (55) for detecting the cut-out position and this Based on the number of characters of the result obtained by the cutout position detection means (55), when performing the character cutout process again, the evaluation as a character in the cutout position detection means (55) is the character number estimation means (57)
Evaluate based on the number of characters obtained in, the evaluation as the inter-character evaluation is based on the number of characters of the result obtained by the cut-out position determining means (55) re-processing instruction means to perform the character cutting process again ( 58) and are provided.

【００３３】本願の文字切出装置は、筆跡データを入力
する入力手段(1)と、該入力手段(1)で入力された筆跡デ
ータから文字を切り出す文字切出装置において、入力さ
れた文字集合の字種を判定する字種判定手段(51,51’)
と、該字種判定手段(51,51’)で判定された字種に基づ
いて切り出しパラメータを変更する変更手段(52,54)と
該変更手段手段(52,54)により変更された切り出しパラ
メータにより切り出し位置を検出する切り出し位置検出
手段(53,55)とを備えることを特徴とする。The character slicing device according to the present application includes an input means (1) for inputting handwriting data, and a character slicing device for slicing characters from the handwriting data input by the input means (1). Character type determination means (51,51 ')
And a changing means (52, 54) for changing the cutout parameter based on the character type determined by the character type determining means (51, 51 ') and the cutout parameter changed by the changing means (52, 54). And a cutout position detecting means (53, 55) for detecting the cutout position.

【００３４】本願は、入力手段(1)から入力された文字
データの１文字らしさと文字間らしさを判定して、切り
出し位置を検出する文字切出装置の文字切出方法におい
て、前記１文字らしさと文字間らしさの判定ための基準
データを個別に設定することを特徴とする。本願は、入
力手段(1)から入力された文字データを少なくても基準
文字幅と基準文字ピッチにより、文字データの１文字ら
しさと文字間らしさを判定して、切り出し位置を検出す
る文字切出装置の文字切出方法において、前記基準文字
幅と基準文字ピッチの値を個別に設定することを特徴と
する。According to the present invention, in the character cutout method of the character cutout device for judging the cutout position by determining the characterlikeness and the characterlikeness of the character data inputted from the input means (1), the characterlikeness It is characterized in that the reference data for determining the character likelihood is set individually. The present application is a character cutout for detecting a cutout position by determining character-likeness and character-likeness of character data based on a reference character width and a reference character pitch, even if there is at least character data input from the input means (1). In the character cutting method of the device, the values of the reference character width and the reference character pitch are individually set.

【００３５】[0035]

【作用】本願発明の文字切出装置では、入力手段(1)で
入力された筆記データの字種を字種判定手段(51,51’)
で判定し、該字種判定手段(51,51’)で判定された字種
に基づいて、切り出しパラメータを変更する。また、本
願発明の文字切出装置では、入力手段(1)で入力された
筆記データの文字数を文字数推定手段(57)で推定し、得
られた文字数に基づいて変更手段(54)で切り出し特徴を
評価し、その評価結果を基に切り出し位置検出手段(55)
で文字切り出し位置を決定する。In the character cutting device of the present invention, the character type of the writing data input by the input means (1) is determined by the character type determining means (51, 51 ').
And the cut-out parameter is changed based on the character type determined by the character type determination means (51, 51 ′). Further, in the character cutting device of the present invention, the number of characters of the writing data input by the input means (1) is estimated by the number-of-characters estimating means (57), and the changing means (54) cuts out the characters based on the obtained number of characters. And the cut-out position detecting means (55) based on the evaluation result.
Use to determine the character cutout position.

【００３６】本願発明の文字切出方法では、前記１文字
らしさと文字間らしさの判定を個別に行うことができ
る。In the character cutting method according to the present invention, it is possible to individually judge the one-character likelihood and the inter-character likelihood.

【００３７】[0037]

【実施例】図５及び図６を参照しつつ、本願の文字切出
装置の第１実施例について説明する。図５は、文字切り
出し部７の構成を示す図である。同図において、図１と
同一部分は省略するとともに、図示した同一部分には、
同一符号を付して、説明を省略する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the character cutting device of the present application will be described with reference to FIGS. FIG. 5 is a diagram showing a configuration of the character cutout unit 7. In the figure, the same parts as in FIG.
The same reference numerals are given and the description is omitted.

【００３８】７は、切り出し部である。５１は、入力文
字列の字種判定部である。字種判定部５１は、入力文字
列がどのような字種で構成されているかを判定する。７
２’は、判定部である。５２は、字種判定部５１で判定
された字種に応じて切り出しパラメータが変更される判
定閾値変更部である。５３は、文字間・非文字間の判定
を行う判定処理部である。Reference numeral 7 is a cutout portion. Reference numeral 51 is a character type determination unit for the input character string. The character type determination unit 51 determines what kind of character type the input character string is composed of. 7
2'is a determination unit. Reference numeral 52 is a determination threshold value changing unit for changing the cutout parameter according to the character type determined by the character type determining unit 51. Reference numeral 53 is a determination processing unit that performs determination between characters / non-characters.

【００３９】つまり、判定部７２’は、字種判定部５１
で判定された字種に応じて、文字間・非文字間の判定を
を行う。７３’は、最適経路検出部である。５４は、字
種判定部５１で判定された字種に応じてパラメータが変
更される切り出し特徴評価部である。５５は、切り出し
位置の判定を行う切り出し位置判定部である。That is, the determining unit 72 'is the character type determining unit 51.
The character-to-character / non-character-to-character determination is performed according to the character type determined in. 73 'is an optimum route detection unit. Reference numeral 54 is a cut-out feature evaluation unit whose parameters are changed according to the character type determined by the character type determination unit 51. Reference numeral 55 is a cutout position determination unit that determines the cutout position.

【００４０】つまり、最適経路検出部７３’は、字種判
定部５１で判定された字種に応じて切り出しパラメータ
が変更される判定閾値変更部である。５３は、文字間・
非文字間の判定を行う判定処理部である。上記字種判定
部５１での、字種判定方法の一例を、図６に示す。図６
の６１は、入力字種設定用のアイコン表示領域である。That is, the optimum route detecting section 73 'is a judgment threshold value changing section for changing the cutout parameter according to the character type judged by the character type judging section 51. 53 is the space between characters
It is a determination processing unit that performs determination between non-characters. FIG. 6 shows an example of a character type determination method in the character type determination unit 51. Figure 6
61 is an icon display area for input character type setting.

【００４１】この例では、タブレット上に字種に対応す
るアイコン（６１ａ〜６１ｆ）を設ける。そして、ユー
ザが入力したい字種に対応するアイコンを入力以前にペ
ンで指示することにより、入力字種を登録する。図６で
は、数字と記号に入力が限定されている状態である。こ
の方法は、伝票入力では数字、英文入力では英字など特
定の字種を連続して入力する場合に特に有効である。In this example, icons (61a to 61f) corresponding to the character types are provided on the tablet. Then, the input character type is registered by pointing the icon corresponding to the character type desired by the user with a pen before inputting. In FIG. 6, the input is limited to numbers and symbols. This method is particularly effective in the case of continuously inputting a specific character type such as numbers for slip input and alphabetic characters for English input.

【００４２】次に、判定部７２’及び最適経路検出部７
３’の動作について説明する。判定部７２’での判定
は、図１の判定部７２のＯＦＦ時間、ＦＡＲ時間、スペ
ース幅による判定と同じであり、判定処理は内部の判定
処理部５３で行っている。但し、それぞれの閾値は、字
種判定部５１の結果に応じて、判定閾値変更部５２で変
更される。Next, the judging unit 72 'and the optimum route detecting unit 7
The operation 3'will be described. The determination by the determination unit 72 ′ is the same as the determination by the OFF time, the FAR time, and the space width of the determination unit 72 in FIG. 1, and the determination processing is performed by the internal determination processing unit 53. However, each threshold is changed by the determination threshold changing unit 52 according to the result of the character type determining unit 51.

【００４３】例えば、字種判定部５１の結果から入力文
字列が数字あるいは英字であると判定された場合には、
文字間確定スペース幅閾値を３．０ｍｍから２．０ｍｍ
に変更する。これは、数字や英字では、漢字の偏や旁の
ように文字内で筆記方向に極端に分離することはないか
らである。For example, when it is determined from the result of the character type determination unit 51 that the input character string is a numeral or an alphabetic character,
Set the space width threshold between characters from 3.0mm to 2.0mm
Change to This is because numbers and English characters are not extremely separated in the writing direction within a character like Kanji's biases and straw.

【００４４】また、数字では「５」、「７」のように分
離する可能性がある文字でも１画と２画の間のスペース
幅は極めて小さいので、閾値を１．０ｍｍまで変更して
もよい。尚、ＯＦＦ時間の閾値についても字種によって
変更してもよい。最適経路検出部７３’は、字種判定部
５１の結果に基づいて、「一文字らしさ」、「文字間ら
しさ」の特徴量を算出する場合に、特徴量の算出方法、
パラメータを字種によって変更する。Also, even for characters that may be separated such as "5" and "7" in the numbers, the space width between one stroke and two strokes is extremely small, so even if the threshold value is changed to 1.0 mm. Good. The OFF time threshold may also be changed depending on the character type. The optimal path detection unit 73 ′ calculates the feature quantity when calculating the feature quantity of “character-likeness” and “character-likeness” based on the result of the character type determination unit 51.
Change the parameter according to the character type.

【００４５】例えば、漢字が含まれていると字種判定部
５１により、判定された場合には、標準文字幅を文字列
の高さ以下にはしない。また、字種判定部５１により、
漢字以外と判断された場合には、標準文字の縦横比（横
／縦）は１.５以下にするなど、各特徴量の標準値に閾
値を設けることができるので正確な切り出しが行える。For example, when the character type determination unit 51 determines that a Chinese character is included, the standard character width is not set to be less than or equal to the height of the character string. In addition, by the character type determination unit 51,
If it is determined to be a character other than Kanji, the standard value of each feature amount can be set to a threshold value such as an aspect ratio (horizontal / vertical) of the standard character of 1.5 or less, so that accurate clipping can be performed.

【００４６】また、字種によって特徴量の算出方法を変
更することができる。漢字が含まれている時は、字形変
形が大きいので標準文字幅のは基本セグメントの最大幅
から求め、その以外の場合には、文字列の高さと同じに
するなどである。ところで、この第１実施例では、一般
的な文字列を入力する場合には、頻繁に字種設定を行わ
なければならないので、筆記データから自動的に字種が
判定できることが望ましい。字種を自動的に判定する第
２実施例を示す。Further, the method of calculating the characteristic amount can be changed depending on the character type. When a Chinese character is included, the character shape is greatly deformed, so the standard character width is obtained from the maximum width of the basic segment, and in other cases, it is the same as the height of the character string. By the way, in the first embodiment, when a general character string is input, it is necessary to frequently set the character type, so it is desirable that the character type can be automatically determined from the writing data. A second embodiment will be described in which the character type is automatically determined.

【００４７】図７を参照しつつ、本願の文字切出装置の
第２実施例について説明する。図７は、文字切り出し部
７の構成を示す図である。同図において、図１と同一部
分は省略するとともに、図１及び図５と同一部分には、
同一符号を付して、説明を簡略化する。７は、切り出し
部である。A second embodiment of the character cutting device of the present invention will be described with reference to FIG. FIG. 7 is a diagram showing a configuration of the character cutout unit 7. In the figure, the same parts as those in FIG. 1 are omitted, and the same parts as those in FIGS.
The same reference numerals are given to simplify the description. Reference numeral 7 is a cutout portion.

【００４８】７２は、図１の判定部と同一のものであ
る。５１’は、入力文字列の字種判定部である。字種判
定部５１’は、入力文字列がどのような字種で構成され
ているかを判定する。７２’は、判定部である。この判
定部７２’は、字種判定部５１’で判定された字種に応
じて、文字間・非文字間の判定のためのパラメ−タを変
更して、文字間・非文字間の判定を行う。Reference numeral 72 is the same as the determination unit in FIG. 51 'is a character type determination unit for the input character string. The character type determination unit 51 'determines what type of character the input character string is composed of. 72 'is a determination unit. The determining unit 72 'changes the parameters for determining the character spacing / non-character spacing according to the character type determined by the character type determining unit 51' to determine the character spacing / non-character spacing. I do.

【００４９】７３’は、最適経路検出部である。この最
適経路検出部７３’は、字種判定部５１で判定された字
種に応じて、最適経路判定の判定のためのパラメ−タを
変更して、切り出し位置の判定を行う。つまり、最適経
路検出部７３’は、字種判定部５１’で判定された字種
に応じて切り出しパラメータが変更される判定閾値変更
部である。５３は、文字間・非文字間の判定を行う判定
処理部である。Reference numeral 73 'is an optimum route detecting section. The optimum route detection unit 73 'changes the parameters for determining the optimum route according to the character type determined by the character type determination unit 51, and determines the cutout position. That is, the optimum path detection unit 73 ′ is a determination threshold value changing unit that changes the cutout parameter according to the character type determined by the character type determination unit 51 ′. Reference numeral 53 is a determination processing unit that performs determination between characters / non-characters.

【００５０】この字種判定部５１’での、字種判定方法
について、説明する。１．字種判定方法１判定部７２では、従来と同様に、ＯＦＦ時間、ＦＡＲ時
間、スペース幅の閾値で仮の基本セグメントを抽出し、
それぞれの画数を計算する。字種判定部５１’では、基
本セグメントの数が閾値（例５個）以上で、その画数が
全て２画以内であるとき、その文字列を数字か英字の小
文字であると判定する。更に確信度を上げる方法として
は、１画の基本セグメントが連続するという条件を加え
てもよい。The character type determination method in this character type determination unit 51 'will be described. 1. Character type determination method 1 The determination unit 72 extracts a tentative basic segment with thresholds of OFF time, FAR time, and space width, as in the conventional case,
Calculate the number of strokes for each. When the number of basic segments is equal to or larger than the threshold value (e.g., 5) and the stroke number is within 2 strokes, the character type determination unit 51 'determines that the character string is a numeral or a lowercase letter. As a method for further increasing the certainty factor, a condition that one basic segment is continuous may be added.

【００５１】また、この数字と英字との判定は、その形
状を調べて、数字、英字それぞれ固有特徴により判定す
る。判定条件としては、数字では「０」や「１」が連続
しやすく、英字の小文字では、垂直方向にストロークが
分離している「ｉ」、「ｊ」の有無が特に有効である。２．字種判定方法２判定部７２では、従来と同様に、ＯＦＦ時間、ＦＡＲ時
間、スペース幅の閾値で仮の基本セグメントを抽出し、
それぞれの画数を計算する。Further, the determination of the number and the letter is made by examining the shape and determining the characteristic of each of the number and the letter. As a determination condition, it is particularly effective to use “0” or “1” in numbers easily in succession and the presence or absence of “i” or “j” in which strokes are separated in the vertical direction in lowercase letters. 2. Character type determination method 2 The determination unit 72 extracts a temporary basic segment with thresholds of OFF time, FAR time, and space width, as in the conventional case,
Calculate the number of strokes for each.

【００５２】字種判定部５１’では、基本セグメントの
数が閾値（例５個）以上で、全て画数が４画以上である
とき、その文字列には漢字が含まれていると判定する。３．字種判定方法３判定部７２では、従来と同様に、ＯＦＦ時間、ＦＡＲ時
間、スペース幅の閾値で仮の基本セグメントを抽出す
る。そして、それぞれについて、図１の認識部５で仮の
文字認識結果（文字コード）を得る。同一字種の数や配
置によって、字種判定部１６で字種を判定する。When the number of basic segments is equal to or greater than the threshold value (e.g., 5) and the number of strokes is 4 or more, the character type determination unit 51 'determines that the character string includes Chinese characters. 3. Character Type Determination Method 3 The determination unit 72 extracts a temporary basic segment with the thresholds of the OFF time, the FAR time, and the space width, as in the conventional case. Then, for each of them, the recognition unit 5 of FIG. 1 obtains a temporary character recognition result (character code). The character type determination unit 16 determines the character type based on the number and arrangement of the same character types.

【００５３】このように、数字、英字、漢字、カタカナ
などは、字種毎に固有の筆記特徴がある。その筆記特徴
を字種の自動判定に用いる方法は、上述した方法だけで
はないことはいうまでもない。この判定部７２’と最適
経路検出部７３’は、第１実施例と同様に動作する。As described above, numbers, English characters, Chinese characters, Katakana, etc. have their own writing characteristics. It goes without saying that the above-mentioned method is not the only method of using the writing feature for automatic character type determination. The determining unit 72 'and the optimum route detecting unit 73' operate in the same manner as in the first embodiment.

【００５４】しかし、この第２実施例では、字種判定部
５１’で字種を筆記データから自動的に判定しているの
で、この判定された字種が誤っている場合もある。従っ
て、判定部７２’での閾値（各種切り出しパラメータ）
の変更量は、第１実施例に比べて、小さく設定する方が
よい。尚、この実施例では、字種判定部５１’を特別に
設けたが、認識部５の結果を流用してもよい。つまり、
認識部での認識された隣接する文字の字種が、同じなら
この文字も同じ字種と推定して、各種切り出しパラメー
タを変更してもよい。However, in the second embodiment, the character type determining unit 51 'automatically determines the character type from the writing data, and thus the determined character type may be incorrect. Therefore, the threshold value (various cutout parameters) in the determination unit 72 ′
It is better to set the change amount of 1 to be smaller than that in the first embodiment. Although the character type determination unit 51 'is specially provided in this embodiment, the result of the recognition unit 5 may be used. That is,
If the character types of the adjacent characters recognized by the recognition unit are the same, this character may be estimated to be the same character type, and various cutout parameters may be changed.

【００５５】図８及び図９を参照しつつ、本願の文字切
出装置の第３実施例について説明する。図８は、文字切
り出し部７の構成を示す図である。図８において、図１
と同一部分は省略するとともに、図１と同一部分には、
同一符号を付して、説明を簡略化する。最適経路検出部
７３”は、判定部７２の結果に基づいて得られたストロ
ーク集合の組み合わせのうち可能性の高いものを検出す
るものであり、従来と同様に動作する。異なる点は、後
処理部５６を備える点である。A third embodiment of the character cutting device of the present invention will be described with reference to FIGS. 8 and 9. FIG. 8 is a diagram showing a configuration of the character cutout unit 7. In FIG. 8, FIG.
1 are omitted, and the same parts as in FIG.
The same reference numerals are given to simplify the description. The optimum path detection unit 73 ″ detects a highly likely combination of stroke sets obtained based on the result of the determination unit 72, and operates in the same manner as the conventional one. This is a point including the portion 56.

【００５６】後処理部５６は、この最適経路検出部７
３”で文字間であると判定された文字間のスペース幅を
参考に、もう一度、文字間を検出する。最適経路検出部
７３”は、従来の図３で説明したように、「一文字らし
さ」と「文字間らしさ」を表す特徴量を算出して、経路
毎の総和を求めることによって、最適な経路を求める。The post-processing unit 56 uses the optimum route detection unit 7
With reference to the space width between the characters determined to be between the characters in 3 ", the character space is detected again. The optimum route detection unit 73" is "single character likeness" as described in FIG. Then, the optimum amount of route is calculated by calculating the feature amount representing "character-likeness" and calculating the total sum for each route.

【００５７】この「一文字らしさ」の特徴量は、前述の
特開平６−１２４３６４号公報に示す如く、文字の縦横
比、文字幅、文字の大きさ、文字内の最大スペース幅等
を用いていて判断している。また、「文字間らしさ」の
特徴量は、前述の特開平６−１２４３６４号公報で示す
如く、文字ピッチ、文字間のスペース幅等を用いて判断
している。As described in the above-mentioned Japanese Patent Laid-Open No. 6-124364, the feature amount of "character likeness" uses the aspect ratio of the character, the character width, the size of the character, the maximum space width in the character, etc. Deciding. Further, the feature amount of "character-likeness" is determined by using the character pitch, the space width between characters, etc., as shown in the above-mentioned Japanese Patent Laid-Open No. 6-124364.

【００５８】この文字書式データは、予かじめ設定した
ものか或るいは、入力文字データから推定されたもので
ある。ここで、図９（ａ）の入力を、最適経路検出部７
３”が図９（ｂ）の、判定したものとする。ここで、後
処理部５６は、この最適経路検出部７３”で文字間であ
ると判定されたスペース幅を検出する。The character format data is preliminarily set, or is estimated from the input character data. Here, the input of FIG.
3 "is determined in FIG. 9B. Here, the post-processing unit 56 detects the space width determined to be between characters by the optimum path detection unit 73".

【００５９】つまり、図９（ｃ）のＷ１，Ｗ２．Ｗ３を
検出し、この幅の最小の値より、大きな空白スペース
は、文字間であると、判定して，図９（ｄ）を出力す
る。この様に、最適経路検出部７３”により判定した、
文字間のデータを流用して、再度、文字の切り出しを行
うことにより、大きさの異なる文字が入力された場合
も、精度良く文字の切り出しが行える。That is, in the case of W1, W2. W3 is detected, and a blank space larger than the minimum value of this width is determined to be between characters, and FIG. 9 (d) is output. In this way, the optimum route detection unit 73 ″ determines,
By diverting the data between the characters and performing the character segmentation again, the character segmentation can be performed accurately even when a character having a different size is input.

【００６０】しかし、図９（ｅ）の如き入力の場合は、
誤って、切り出しをすることもある。従って、後処理部
７３”は、検出した文字間のスペース，Ｗ１Ｗ２．Ｗ
３の最小の値より、大きく、且つ、所定の幅より大きな
スペースを、文字間であると、判定してもよい。図１０
及び図１１を参照しつつ、本願の文字切出装置の第４実
施例について説明する。図１０は、文字切り出し部７の
構成を示す図である。図１０において、図１と同一部分
は省略するとともに、図１及び図５と同一部分には、同
一符号を付して、説明を簡略化する。However, in the case of the input as shown in FIG. 9 (e),
In some cases, it is cut out by mistake. Therefore, the post-processing unit 73 ″ determines that the space between the detected characters, W1 W2.W.
A space larger than the minimum value of 3 and larger than a predetermined width may be determined to be between characters. Figure 10
A fourth embodiment of the character cutting device of the present application will be described with reference to FIGS. FIG. 10 is a diagram showing a configuration of the character cutout unit 7. 10, the same parts as those in FIG. 1 are omitted, and the same parts as those in FIGS. 1 and 5 are designated by the same reference numerals to simplify the description.

【００６１】最適経路検出部７３”’は、判定部７２の
結果に基づいて得られたストローク集合の組み合わせの
うち可能性の高いものを検出するものである。この最適
経路検出部７３”’の字数判定部５７は、入力文字デー
タから標準文字幅、標準文字ピッチを推定する。標準文
字幅、標準文字ピッチを推定する算出の一例を示す。The optimum route detection unit 73 "'detects a combination of stroke sets obtained on the basis of the result of the determination unit 72, which has a high possibility. The character number determination unit 57 estimates a standard character width and a standard character pitch from the input character data. An example of calculation for estimating the standard character width and standard character pitch will be shown.

【００６２】標準文字幅＝文字列の高さ×定数。標準の文字ピッチ＝文字列の高さ×定数。また、この標準文字幅、標準文字ピッチを推定する算出
の他の一例を示す。標準文字幅＝基本セグメントの最大幅。標準の文字ピッチ＝基本セグメントの最大ピッチ。Standard character width = height of character string × constant. Standard character pitch = height of character string x constant. Another example of calculation for estimating the standard character width and standard character pitch will be shown. Standard character width = maximum width of basic segment. Standard character pitch = maximum pitch of basic segment.

【００６３】そして、上述の結果より、筆記された文字
の数を推定する。推定文字数＝（文字列の横幅−標準文字幅）÷標準の文
字ピッチこの字数判定部５７で得られた推定文字数により、この
時の文字間の存在個数が推定できる。この第４実施例で
の字数判定部５７の動作を、図２の入力の場合を例にし
て、、説明する。Then, the number of written characters is estimated from the above result. Estimated number of characters = (width of character string−standard character width) / standard character pitch From the estimated number of characters obtained by the character number determination unit 57, the number of existing characters between characters at this time can be estimated. The operation of the character number determination unit 57 in the fourth embodiment will be described by taking the case of input in FIG. 2 as an example.

【００６４】図２に示すように基本セグメントＥ５間の
スペース幅Ｅ７が左から順に−１ｍｍ、４ｍｍ、２ｍ
ｍ、３.５ｍｍ、１ｍｍになっている。そのスペース幅
を大きい順に並べ替えると４ｍｍ、３.５ｍｍ、２ｍ
ｍ、、１ｍｍ、−１ｍｍとなる。上式により、推定文字
数が、「２文字」と推定されたとものと仮定する。As shown in FIG. 2, the space width E7 between the basic segments E5 is -1 mm, 4 mm, 2 m in order from the left.
m, 3.5 mm, 1 mm. If the space widths are sorted in descending order, 4mm, 3.5mm, 2m
m, 1 mm, -1 mm. It is assumed that the number of estimated characters is estimated to be “2 characters” by the above equation.

【００６５】尚、この図２の例ではＥ４に示すようにｂ
とｃ間、ｄとｅ間は、確定切り出し位置であるので、最
低でも推定文字は３となるのが正解である。この図２の
例で２個と推定されるには、図１１の如く、１文字が大
きく筆記された場合である。推定文字数が、「２文字」
ということは、存在する文字間は、一個である。そし
て、最大のスペース幅は、４ｍｍであるので、この４ｍ
ｍが、文字間であり、２番目のスペース幅は、３．５ｍ
ｍは、文字内に存在するスペースと、推定される。In the example of FIG. 2, as shown by E4, b
Between c and c and between d and e are definitive cut-out positions, the correct answer is at least 3 estimated characters. In the example of FIG. 2, the number of characters is estimated to be two when one character is written large as shown in FIG. Estimated number of characters is "2 characters"
This means that there is only one existing character. And the maximum space width is 4mm, so this 4m
m is the space between characters, and the second space width is 3.5 m
m is estimated to be the space present in the character.

【００６６】つまり、スペース幅の基準は、４ｍｍと
３.５ｍｍの間にあることになる。スペース幅の基準と
いうのは、その値より大きければ「文字間らしい」と判
定でき、その値より小さければ「非文字間らしい」と判
定できる閾値である。ここでは、このスペース幅の基準
を、４ｍｍと３.５ｍｍの中間である３.７５ｍｍに設定
する。That is, the standard of the space width is between 4 mm and 3.5 mm. The space width criterion is a threshold value that can be determined to be “likely between characters” if it is larger than that value and “likely to be non-character space” if smaller than that value. Here, the standard of this space width is set to 3.75 mm, which is between 4 mm and 3.5 mm.

【００６７】ストローク集合Ｅ６で、ストローク集合内
の最大スペース幅が、この値より小さければ小さいほ
ど、「一文字らしさ」の評価は高くなり、分離し難くな
る。一方、ストローク集合間のスペース幅がこの値より
大きければ大きいほど「文字間らしさ」の評価は高くな
り、そのストローク集合間は切れやすくなる。つまり、
字数判定部５７は、字数を推定し、この推定文字数に応
じて、切り出し特徴評価部５４のスペース幅の基準を、
変更する。In the stroke set E6, the smaller the maximum space width in the stroke set is, the higher the evaluation of "uniqueness" and the more difficult it is to separate. On the other hand, if the space width between stroke sets is larger than this value, the "character-likeness" is evaluated more and the stroke sets are more likely to be cut. That is,
The character number determination unit 57 estimates the number of characters, and according to the estimated number of characters, the reference of the space width of the cut-out feature evaluation unit 54 is
change.

【００６８】最適経路検出部７３”’は、このようにし
て変更され、その他の点は、従来と同様に切り出し処理
を行う。そして、「一文字らしさ」と「文字間らしさ」
の特徴量を求め従来と同様に経路の総和を求めることに
よって、最適な経路を求める。この場合、２−５−８が
最高得点で、２−４−６−８が２位だったとする。The optimum route detection unit 73 "'is changed in this way, and in other respects, the cutout processing is performed in the same manner as in the prior art. Then," character-likeness "and" character-likeness "
The optimum route is obtained by obtaining the feature amount of and the total of the routes as in the conventional method. In this case, 2-5-8 was the highest score, and 2-4-6-8 was second.

【００６９】この結果から従来と同様に文字認識、言語
処理を行って最適な文字列を求めてもよい。しかし、最
高得点を得た２−５−８の組み合わせの文字数は「３文
字」であり、推定した文字数「２文字」とは違う結果が
得られている。この結果を算出するのに用いた「一文字
らしさ」、「文字間らしさ」を求めるための特徴量は、
文字数が「２文字」であるときの評価値であるので、再
度得られた「３文字」という情報を基に「一文字らし
さ」、「文字間らしさ」の評価をやり直しても良い。From this result, character recognition and language processing may be performed as in the conventional case to obtain an optimum character string. However, the number of characters of the combination of 2-5-8 which obtained the highest score is "3 characters", and the result different from the estimated number of characters "2 characters" is obtained. The feature quantities used to calculate "result-likeness" and "character-likeness" used to calculate this result are
Since it is the evaluation value when the number of characters is “2 characters”, the evaluation of “character-likeness” and “character-likeness” may be redone based on the information “3 characters” obtained again.

【００７０】再評価する例を第５実施例として説明す
る。図１２を参照しつつ、本願の文字切出装置の第５実
施例について説明する。図１２は、文字切り出し部７の
構成を示す図である。図１２において、図１と同一部分
は省略するとともに、図１及び図１０と同一部分には、
同一符号を付して、説明を簡略化する。An example of re-evaluation will be described as a fifth embodiment. A fifth embodiment of the character cutting device of the present application will be described with reference to FIG. FIG. 12 is a diagram showing a configuration of the character cutout unit 7. In FIG. 12, the same parts as in FIG. 1 are omitted, and the same parts as in FIGS.
The same reference numerals are given to simplify the description.

【００７１】最適経路検出部７３””は、判定部７２の
結果に基づいて得られたストローク集合の組み合わせの
うち可能性の高いものを検出するものである。この最適
経路検出部７３””の字数変更検出部５８は、字数判定
部５７で推定した文字数より、切り出し結果の文字数が
多い場合に、これを検出する。そして、この新たな字数
を字数判定部５７に再設定して、最適経路検出部７
３””での切り出しを再度行う。The optimum path detecting section 73 "" detects a highly probable combination of stroke sets obtained based on the result of the judging section 72. The number-of-characters change detection unit 58 of the optimum path detection unit 73 "" detects this when the number of characters in the cutout result is larger than the number of characters estimated by the number-of-characters determination unit 57. Then, this new number of characters is reset in the number-of-characters determination unit 57, and the optimum route detection unit 7
Cut out 3 "" again.

【００７２】この最適経路検出部７３””の字数判定部
５７は、図２の例で言えば、「２文字」の場合に算出し
て用いたスペース幅の基準を変更する。前述したよう
に、そのスペース幅を大きい順に並べ替えると４ｍｍ、
３.５ｍｍ、２ｍｍ、、１ｍｍ、−１ｍｍとなる。文字
数が、「３文字」ということは、存在する文字間は、２
個である。The number-of-characters determination unit 57 of the optimum route detection unit 73 "" changes the space width reference calculated and used in the case of "2 characters" in the example of FIG. As mentioned above, if the space widths are sorted in descending order, 4mm,
It becomes 3.5 mm, 2 mm, 1 mm, and -1 mm. The number of characters is "3 characters", which means that there is 2 between existing characters.
Individual.

【００７３】そして、スペース幅の４ｍｍと３.５ｍｍ
が、文字間であり、３番目のスペース幅は、２ｍｍは、
文字内に存在するスペースと、推定される。つまり、ス
ペース幅の評価については、基準を３．５ｍｍと２ｍｍ
の間の３ｍｍに設定し直して行う。又、文字ピッチにつ
いては次の式で求め直す。The space width of 4 mm and 3.5 mm
Is between the characters, and the third space width is 2 mm,
It is presumed to be the space existing in the character. In other words, for the evaluation of the space width, the standard is 3.5 mm and 2 mm.
It is performed by resetting the distance to 3 mm. Further, the character pitch is recalculated by the following formula.

【００７４】標準文字ピッチ＝（文字列の横幅−標準文
字幅）÷結果の文字数。この再評価の際に、標準文字幅、標準文字の縦横比、標
準文字の大きさは変更しない。以上のように改められた
特徴の評価値により、「一文字らしさ」、「文字間らし
さ」を最適経路検出部で算出して、切り出し結果を得
る。Standard character pitch = (width of character string−standard character width) / number of resulting characters. The standard character width, standard character aspect ratio, and standard character size are not changed during this re-evaluation. Based on the evaluation values of the features revised as described above, the "single character likeness" and the "character spacing likeness" are calculated by the optimum path detecting unit, and the cutout result is obtained.

【００７５】この切り出しの結果の「文字数」が、更に
増加していることを字数変更検出部５８で検出して、再
度この処理を行ってもよいが、１回だけでも効果はあ
る。尚、切り出す結果の文字数を用いた再評価の際に、
標準文字幅、標準文字の縦横比、標準文字の大きさは変
更しないのは、大きな文字と小さな文字が混在していて
も、対応するためである。The number of characters change detection unit 58 may detect that the "number of characters" as a result of this clipping has further increased, and this process may be performed again, but it is effective only once. In addition, at the time of re-evaluation using the number of characters of the cut out result,
The standard character width, the aspect ratio of the standard character, and the size of the standard character are not changed in order to handle large and small characters even if they are mixed.

【００７６】[0076]

【発明の効果】本願の請求項１〜５，１０の文字切出装
置によれば、入力文字列の字種構成を判定して、その判
定結果に応じた文字切り出し処理を行うことで文字切り
出し精度を向上させることができる。本願の請求項４，
５の文字切出装置によれば、入力文字列の字種構成を自
動的に判定して、その判定結果に応じた文字切り出し処
理を行うことができる。According to the character slicing device of claims 1 to 5 and 10 of the present application, the character slicing process is performed by determining the character type configuration of the input character string and performing the character slicing process according to the determination result. The accuracy can be improved. Claim 4 of the present application
According to the character cutting device of No. 5, it is possible to automatically judge the character type configuration of the input character string and perform the character cutting process according to the judgment result.

【００７７】本願の請求項６〜９の文字切出装置によれ
ば、大きさの等の異なる不揃いな文字データについて
も、文字切り出し精度を向上させることができる。本願
の請求項１１，１２の文字切出方法によれば、１文字ら
しさの判定と文字間らしさの判定の基準を、個別に設定
できる。従って、一方を大きな文字に対応する値（例え
ば、文字幅）、他方を小さな文字に設定する値（例え
ば、文字ピッチ）に対応させることにより、不揃いな文
字が入力された場合の切り出し精度を向上することがで
きる。According to the character slicing device of claims 6 to 9 of the present application, it is possible to improve the character slicing accuracy even for irregular character data having different sizes and the like. According to the character cutout method of claims 11 and 12 of the present application, the criteria for determining the likelihood of one character and the likelihood of character spacing can be set individually. Therefore, by making one correspond to a value corresponding to a large character (for example, character width) and the other to a value set for a small character (for example, character pitch), the cutout accuracy is improved when irregular characters are input. can do.

[Brief description of drawings]

【図１】従来の文字切出装置の概略構成を示すブロック
図である。FIG. 1 is a block diagram showing a schematic configuration of a conventional character cutting device.

【図２】文字切出装置の基本セグメント、ストローク集
合抽出方法を説明するための図である。FIG. 2 is a diagram for explaining a basic segment / stroke set extraction method of the character segmentation device.

【図３】切り出し評価値をネットワークで表現した図で
ある。FIG. 3 is a diagram in which cutout evaluation values are expressed by a network.

【図４】文字切出装置の入力例を示した図である。FIG. 4 is a diagram showing an input example of a character cutting device.

【図５】本発明の第２実施例の切り出し部７の概略を示
した図である。FIG. 5 is a diagram showing an outline of a cutout portion 7 of a second embodiment of the present invention.

【図６】この第２実施例の字種判定方法の一例を説明す
るための図である。FIG. 6 is a diagram for explaining an example of a character type determination method according to the second embodiment.

【図７】本発明の第２実施例の切り出し部７の概略を示
した図である。FIG. 7 is a diagram showing an outline of a cutout portion 7 of a second embodiment of the present invention.

【図８】本発明の第３実施例の切り出し部７の概略を示
した図である。FIG. 8 is a diagram schematically showing a cutout portion 7 of a third embodiment of the present invention.

【図９】この第３実施例の動作をを説明するための図で
ある。FIG. 9 is a diagram for explaining the operation of the third embodiment.

【図１０】本発明の第４実施例の切り出し部７の概略を
示した図である。FIG. 10 is a diagram showing an outline of a cutout portion 7 of a fourth embodiment of the present invention.

【図１１】この第４実施例の動作を説明するための図で
ある。FIG. 11 is a diagram for explaining the operation of the fourth embodiment.

【図１２】本発明の第５実施例の切り出し部７の概略を
示した図である。FIG. 12 is a diagram showing an outline of a cutout portion 7 of a fifth embodiment of the present invention.

[Explanation of symbols]

１入力部（入力手段）、５文字認識部（認識手段）、７切り出し部、７１ストローク列抽出部、７２’ 判定部、７３’ 最適経路検出部、７３” 最適経路検出部、７３”’最適経路検出部、７３””最適経路検出部、５１字種判定部（字種判定手段）、５１’ 字種判定部（字種判定手段）、５２判定閾値変更部（変更手段）、５３判定処理部（切り出し位置検出手段）、５４切り出し特徴評価部（変更手段）、５５切り出し位置決定部（切り出し位置検出手
段）、５６後処理部、５７字数判定部（文字数推定手段）、５８字数変更検出部（再処理指示手段）。1 input unit (input unit), 5 character recognition unit (recognition unit), 7 cutout unit, 71 stroke sequence extraction unit, 72 'determination unit, 73' optimal route detection unit, 73 "optimal route detection unit, 73"'optimal Route detection unit, 73 "" optimum route detection unit, 51 character type determination unit (character type determination unit), 51 'character type determination unit (character type determination unit), 52 determination threshold value changing unit (changing unit), 53 determination processing Section (cutout position detecting means), 54 cutout feature evaluation section (changing means), 55 cutout position determining section (cutout position detecting means), 56 post-processing section, 57 character number determination section (character number estimating means), 58 character number change detecting section (Reprocessing instruction means).

Claims

[Claims]

1. An input means (1) for inputting character data,
In a character cutout device for cutting out characters from the character data input by the input means (1), a character type determination means (51, 51) for determining the character type of the input character set.
51 ') and changing means (52, 5) for changing the cut-out parameter based on the character type determined by the character type determining means (51, 51').
4) A character cutting device including and.

2. An input means (1) for inputting character data,
In a character cutout device for cutting out characters from the character data input by the input means (1), a character type determination means (51, 51) for determining the character type of the input character set.
51 ') and changing means for selectively using a plurality of cutting methods based on the character type determined by the character type determining means (51, 51')
A character cutting device including (52, 54).

3. The character type judging means (51) judges as follows.
3. The character cutting device according to claim 1, wherein the character cutting device is performed based on the setting of the character type setting mode.

4. The character cutting device according to claim 1, wherein the character type judging means (51 ′) makes the judgment based on the number of strokes or the shape of the character data.

5. The determination by the character type determination means is characterized in that a recognition means (5, 10) for recognizing the cut out character is provided and the determination is made based on a result of the recognition means (5, 10). The character cutting device according to claim 1 or 2.

6. The input means (1) for inputting character data, the character number estimating means (57) for estimating the number of characters from the character data input by the input means (1), and the character number estimating means (57). Changing means (54) for changing the cutout parameter based on the obtained number of characters, and cutout position detecting means for detecting the cutout position by the cutout parameter changed by the changing means (54)
(55) A character slicing device comprising: Character cutting device equipped with.

7. The input means (1) for inputting character data, the character number estimating means (57) for estimating the number of characters from the character data input by the input means (1), and the character number estimating means (57). Changing means (54) for changing the cutout parameter based on the obtained number of characters, and cutout position detecting means for detecting the cutout position by the cutout parameter changed by the changing means (54)
A character cutting device comprising (55) and a re-processing instruction means (58) for performing a character cutting process again based on the number of characters obtained as a result of the cutting position detecting means (55).

8. The input means (1) for inputting character data, the character number estimating means (57) for estimating the number of characters from the character data input by the input means (1), and the character number estimating means (57). Changing means (54) for changing the cutout parameter based on the obtained number of characters, and cutout position detecting means for detecting the cutout position by the cutout parameter changed by the changing means (54)
(55) and the number of characters of the result obtained by this cut-out position detection means (55) and the number of characters of the result obtained by the number-of-characters estimation means (57) are compared, and the number of characters of the former is smaller than the number of characters of the latter. Sometimes, the character cutting device further comprises reprocessing instruction means (58) for performing the character cutting processing again based on the number of characters obtained as a result of the cutting position detecting means (55).

9. An input means (1) for inputting character data, a character number estimating means (57) for estimating the number of characters from the character data input by the input means (1), and the character number estimating means (57). The changing means (54) for changing the cutout parameter based on the obtained number of characters, and the cutout position detected by the cutout parameter changed by the changing means means (54) are evaluated at least by dividing the character and the space between the characters. , By this separately evaluated synthesis,
Based on the number of characters of the result obtained by the cutout position detection means (55) and the cutout position detection means (55) to detect the cutout position, when performing the character cutout process again, in the cutout position detection means (55) The evaluation as a character is evaluated based on the number of characters obtained by the number-of-characters estimating means (57), and the evaluation as a character space is based on the number of characters of the result obtained by the cut-out position determining means (55). A character cutting device comprising reprocessing instruction means (58) for performing the character cutting process again.

10. Input means for inputting handwriting data (1)
And a character slicing device for slicing a character from the handwriting data input by the input means (1), a character type determining means (51,
51 '), a changing means (52, 54) for changing the cut-out parameter based on the character type determined by the character type determining means (51, 51'), and the changing means (52, 54). Position detecting means for detecting the cutting position by the cutting parameter
(53, 55) and a character cutting device.

11. A character cutout method of a character cutout device for detecting a cutout position by determining whether one character and the likeness of character data inputted from the input means (1) are judged, and A character cutting method characterized by individually setting reference data for determining the likelihood of characters.

12. The reference character width and the reference character pitch allow at least the character data input from the input means (1) to be
Judge the character-likeness and character-likeness of character data,
A character cutting method for a character cutting device for detecting a cutting position, wherein the values of the reference character width and the reference character pitch are individually set.