TWI692696B

TWI692696B - Text mining support method and device

Info

Publication number: TWI692696B
Application number: TW107106049A
Authority: TW
Inventors: 西川康平
Original assignee: 日商斯庫林集團股份有限公司
Priority date: 2017-03-15
Filing date: 2018-02-23
Publication date: 2020-05-01
Also published as: KR20180105566A; JP6829117B2; JP2018152023A; CN108628928A; CN108628928B; KR102230102B1; TW201835790A

Abstract

一種文本挖掘支援方法及裝置，當顯示表示對應分析的結果的散佈圖時，顯示包含散佈圖與表示散佈圖的看法的啟示的支援畫面。當顯示與單詞及變數相關的散佈圖時，從不含啟示的基本畫面、含有原點附近的單詞的判斷方法作為啟示的第1支援畫面、含有對變數賦予特徵的單詞的關聯度的判斷方法作為啟示的第2支援畫面、含有單詞彼此的類似度的判斷方法作為啟示的第3支援畫面、及含有變數彼此的類似度的判斷方法作為啟示的第4支援畫面中，顯示利用者所指示的畫面。由此，可有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。A text mining support method and device, when displaying a scatter diagram representing the result of corresponding analysis, displays a support screen including a scatter diagram and an enlightenment representing the view of the scatter diagram. When displaying a scatter diagram related to words and variables, the basic screen without revelation, the judgment method containing words near the origin as the first support screen for revelation, and the judgment method containing the degree of relevance of words that characterize variables The second support screen as a revelation, the judgment method including the similarity between words as the third support screen as the revelation, and the judgment method as the similarity between the variables as the fourth support screen as the revelation display the instructions indicated by the user Screen. Thereby, it is possible to efficiently perform a process of guiding insights from the graph showing the results of the corresponding analysis.

Description

Text mining support method and device

本發明是有關於一種資料挖掘技術，且特別是有關於一種支援文本挖掘（text mining）的執行的文本挖掘支援方法及裝置。The present invention relates to a data mining technology, and in particular to a text mining support method and device that supports the execution of text mining.

近年來，針對大量的資料應用統計學或模式識別等資料分析技術，並從大量的資料中引導出見解（資料中所顯現的規則等）的資料挖掘技術正受到矚目。將文本資料作為對象的資料挖掘被稱為文本挖掘。以下，考慮針對文本資料，進行作為資料分析技術的一種的對應分析（correspondence analysis）的情況。In recent years, data mining techniques that apply statistics or pattern recognition and other data analysis techniques to a large amount of data, and lead insights (rules, etc. appearing in the data) from a large amount of data are attracting attention. Data mining that takes text data as an object is called text mining. In the following, consider the case of performing correspondence analysis as a kind of data analysis technique for text data.

在對應分析中，針對複合表（cross tabulation table），進行以表頭項目與表側項目之間的關聯變成最大的方式重新排列各項目的處理。進行對應分析的結果通常使用散佈圖（二維圖表）來表現。例如，若對圖2中所示的複合表進行對應分析，則可獲得圖3中所示的散佈圖。In the correspondence analysis, the cross tabulation table is rearranged in such a way that the relationship between the header item and the table-side item becomes the largest. The results of the correspondence analysis are usually expressed using scatter diagrams (two-dimensional charts). For example, if the composite table shown in FIG. 2 is subjected to correspondence analysis, the scatter diagram shown in FIG. 3 can be obtained.

在與本申請發明相關聯的日本專利特開2005-44087號公報中，記載有一種對利用者提示使用多個分析工具時的分析流程的文本挖掘系統。若使用所述文獻中所記載的系統，則即便是與文本挖掘相關的知識或經驗少的利用者，也能夠以適宜的順序使用多個分析工具進行分析。Japanese Patent Laid-Open No. 2005-44087 related to the invention of the present application describes a text mining system that presents a user with an analysis flow when using multiple analysis tools. If the system described in the above document is used, even a user with little knowledge or experience related to text mining can use multiple analysis tools for analysis in an appropriate order.

[發明所欲解決的課題] 在對應分析中，與求出散佈圖相比，對所求出的散佈圖進行考察，並引導出見解更重要。但是，與文本挖掘相關的知識或經驗少的利用者不懂散佈圖的看法，因此即便觀看散佈圖，也不懂首先進行什麼才好。因此，知識或經驗少的利用者無法有效率地進行從散佈圖中引導出見解的處理。[Problems to be Solved by the Invention] In correspondence analysis, it is more important to examine the obtained scatter diagram than to find the scatter diagram, and to guide insights. However, users with little knowledge or experience related to text mining do not understand the view of the scatter diagram, so even if they view the scatter diagram, they do not understand what to do first. Therefore, a user with little knowledge or experience cannot efficiently perform the process of guiding insights from the scatter diagram.

專利文獻1中所記載的系統雖然對利用者提示分析流程，但並不支援從分析結果中引導出見解的處理。因此，即便使用專利文獻1中所記載的系統，也無法解決所述課題。Although the system described in Patent Document 1 presents the analysis flow to the user, it does not support the process of guiding insights from the analysis results. Therefore, even if the system described in Patent Document 1 is used, the above problem cannot be solved.

因此，本發明的目的在於提供一種用以有效率地進行從表示對應分析的結果的圖表中引導出見解的處理的文本挖掘支援方法及裝置。Therefore, an object of the present invention is to provide a text mining support method and apparatus for efficiently performing processing that leads insights from a graph showing the results of correspondence analysis.

[解決問題的技術手段]為了達成所述目的，本發明具有以下的特徵。[Technical Means for Solving the Problems] In order to achieve the above object, the present invention has the following features.

本發明的第1形態是一種文本挖掘支援方法，其顯示由對應分析所得的分析結果，其包括輸入所述分析結果的步驟；輸入來自利用者的指示的步驟；生成包含表示所述分析結果的圖表的畫面的畫面資料的步驟；以及根據所述畫面資料，顯示畫面的步驟；且所述生成畫面資料的步驟對應於所述指示，生成包含所述圖表與表示所述圖表的看法的啟示(hint)的支援畫面的畫面資料。The first aspect of the present invention is a text mining support method, which displays the analysis result obtained by the corresponding analysis, and includes the step of inputting the analysis result; the step of inputting an instruction from the user; The step of displaying the screen data of the screen of the chart; and the step of displaying the screen based on the screen data; and the step of generating the screen data corresponds to the instruction and generates an inspiration including the chart and the view representing the chart ( hint) screen data of the supported screen.

本發明的第2形態是在本發明的第1形態中，所述生成畫面資料的步驟生成從多個支援畫面與包含所述圖表且不含所述啟示的基本畫面之中，對應於所述指示所選擇的畫面的畫面資料。The second aspect of the present invention is the first aspect of the present invention, wherein the step of generating screen data generates from a plurality of support screens and a basic screen including the chart without the revelation, corresponding to the Indicates the screen data of the selected screen.

本發明的第3形態是在本發明的第2形態中，在所述輸入分析結果的步驟中，輸入將第1項目與第2項目建立對應的結果，即包含所述第1項目的第1成分及第2成分與所述第2項目的第1成分及第2成分的結果作為所述分析結果，所述生成畫面資料的步驟製作在將所述第1成分作為橫軸，將所述第2成分作為縱軸的平面內對所述第1項目與所述第2項目進行繪圖而成的散佈圖作為所述圖表。The third aspect of the present invention is the second aspect of the present invention. In the step of inputting the analysis result, the result of associating the first item with the second item, that is, the first item including the first item is input. The results of the component and the second component and the first component and the second component of the second item are used as the analysis result, and the step of generating the screen data is made by using the first component as the horizontal axis and the first A scatter diagram obtained by plotting the first item and the second item in a plane with two components as the vertical axis is used as the graph.

本發明的第4形態是在本發明的第3形態中，所述多個支援畫面包括第1支援畫面，所述第1支援畫面含有在散佈圖內原點附近的第1項目不具有顯著的特徵的意思作為所述啟示。The fourth aspect of the present invention is the third aspect of the present invention, wherein the plurality of support screens include a first support screen, and the first support screen includes the first item near the origin within the scatter diagram and has no significant The meaning of the characteristics serves as the revelation.

本發明的第5形態是在本發明的第4形態中，在所述第1支援畫面中所含有的散佈圖中圖示有原點附近的範圍。A fifth aspect of the present invention is the fourth aspect of the present invention. The scatter diagram included in the first support screen shows a range around the origin.

本發明的第6形態是在本發明的第3形態中，所述多個支援畫面包括第2支援畫面，所述第2支援畫面含有在散佈圖內位於從原點向第2項目離去的方向上的第1項目對所述第2項目賦予特徵的意思作為所述啟示。A sixth aspect of the present invention is the third aspect of the present invention, wherein the plurality of support screens include a second support screen including the second support screen located in the scatter diagram and departing from the origin to the second item The first item in the direction gives the feature to the second item as the revelation.

本發明的第7形態是在本發明的第6形態中，在所述第2支援畫面中所含有的散佈圖中圖示有從原點向所選擇的第2項目離去的方向的範圍。A seventh aspect of the present invention is the sixth aspect of the present invention. The scatter diagram included in the second support screen shows a range of directions from the origin to the selected second item.

本發明的第8形態是在本發明的第3形態中，所述多個支援畫面包括第3支援畫面，所述第3支援畫面含有在散佈圖內距離近的第1項目彼此的類似度高的意思作為所述啟示。An eighth aspect of the present invention is the third aspect of the present invention, wherein the plurality of support screens include a third support screen, and the third support screen includes the first items that are close in distance in the scatter diagram and have a high degree of similarity to each other As the revelation.

本發明的第9形態是在本發明的第8形態中，在所述第3支援畫面中所含有的散佈圖中圖示有所選擇的第1項目附近的範圍。The ninth aspect of the present invention is the eighth aspect of the present invention, in which the range around the selected first item is shown in the scatter diagram included in the third support screen.

本發明的第10形態是在本發明的第3形態中，所述多個支援畫面包括第4支援畫面，所述第4支援畫面含有在散佈圖內距離近的第2項目彼此的類似度高的意思作為所述啟示。A tenth aspect of the present invention is the third aspect of the present invention, wherein the plurality of support screens include a fourth support screen, and the fourth support screen includes second items that are close in distance in the scatter diagram and have a high degree of similarity to each other As the revelation.

本發明的第11形態是在本發明的第10形態中，在所述第4支援畫面中所含有的散佈圖中圖示有表示與所選擇的第2項目的距離最近的第2項目的符號。According to an eleventh aspect of the present invention, in the tenth aspect of the present invention, the scatter diagram included in the fourth support screen is shown with a symbol indicating the second item closest to the selected second item. .

本發明的第12形態是在本發明的第3形態中，在所述輸入分析結果的步驟中，輸入對將單詞作為所述第1項目，將文章的部分作為所述第2項目，將文章的各部分中的各單詞的出現頻率作為表內資料的複合表進行對應分析的結果作為所述分析結果。A twelfth aspect of the present invention is the third aspect of the present invention. In the step of inputting the analysis result, an input pair includes a word as the first item, and a part of the article as the second item. The frequency of occurrence of each word in each part of is used as a result of corresponding analysis of a composite table of data in the table as the analysis result.

本發明的第13形態是一種文本挖掘支援裝置，其顯示由對應分析所得的分析結果，包括分析結果輸入部，用以輸入所述分析結果；指示輸入部，用以輸入來自利用者的指示；畫面生成部，生成包含表示所述分析結果的圖表的畫面的畫面資料；以及分析結果顯示部，根據所述畫面資料，顯示畫面；且所述畫面生成部對應於所述指示，生成包含所述圖表與表示所述圖表的看法的啟示的支援畫面的畫面資料。A thirteenth aspect of the present invention is a text mining support device that displays an analysis result obtained by corresponding analysis, and includes an analysis result input section for inputting the analysis result; an instruction input section for inputting instructions from a user; A screen generating unit that generates screen data including a screen representing a graph of the analysis result; and an analysis result display unit that displays a screen based on the screen data; and the screen generating unit corresponding to the instruction generates a screen including the The screen data of the graph and the support screen showing the inspiration of the graph.

本發明的第14形態是在本發明的第13形態中，所述畫面生成部生成從多個支援畫面與包含所述圖表且不含所述啟示的基本畫面之中，對應於所述指示所選擇的畫面的畫面資料。A fourteenth aspect of the present invention is the thirteenth aspect of the present invention, wherein the screen generating unit generates from a plurality of support screens and a basic screen including the graph without the revelation, corresponding to the instruction Screen data of the selected screen.

本發明的第15形態是在本發明的第14形態中，在所述分析結果輸入部中，輸入將第1項目與第2項目建立對應的結果，即包含所述第1項目的第1成分及第2成分與所述第2項目的第1成分及第2成分的結果作為所述分析結果，所述畫面生成部製作在將所述第1成分作為橫軸，將所述第2成分作為縱軸的平面內對所述第1項目與所述第2項目進行繪圖而成的散佈圖作為所述圖表。A fifteenth aspect of the present invention is the fourteenth aspect of the present invention. In the analysis result input unit, a result of associating the first item with the second item, that is, the first component including the first item is input And the second component and the results of the first component and the second component of the second item are used as the analysis results, and the screen generating unit is configured to use the first component as the horizontal axis and the second component as A scatter diagram obtained by plotting the first item and the second item in the plane of the vertical axis is used as the graph.

本發明的第16形態是在本發明的第15形態中，在所述分析結果輸入部中，輸入對將單詞作為所述第1項目，將文章的部分作為所述第2項目，將文章的各部分中的各單詞的出現頻率作為表內資料的複合表進行對應分析的結果作為所述分析結果。A sixteenth aspect of the present invention is the fifteenth aspect of the present invention. In the analysis result input section, a pair of words is used as the first item, and a part of the article is used as the second item. The appearance frequency of each word in each part is used as the analysis result of the corresponding analysis of the composite table of the data in the table as the analysis result.

[發明的效果]根據所述第1形態或第13形態，利用者可使用包含表示對應分析的結果的圖表與表示圖表的看法的啟示的支援畫面，有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。[Effect of the Invention] According to the first aspect or the thirteenth aspect, the user can use the support screen including the graph showing the result of the correspondence analysis and the inspiration showing the view of the graph to efficiently perform the process of displaying the result of the correspondence analysis. The chart leads to the processing of insights.

根據所述第2形態或第14形態，通過選擇性地顯示包含啟示的支援畫面與不含啟示的基本畫面，可顯示對應於利用者的水平的畫面。另外，通過選擇性地顯示多個支援畫面，可對利用者提示多種圖表的看法。According to the second aspect or the fourteenth aspect, by selectively displaying the support screen including the inspiration and the basic screen without the inspiration, a screen corresponding to the level of the user can be displayed. In addition, by selectively displaying multiple support screens, users can be presented with a variety of chart views.

根據所述第3形態或第15形態，利用者可有效率地進行從表示與第1項目及第2項目相關的對應分析的結果的散佈圖中引導出見解的處理。According to the third aspect or the fifteenth aspect, the user can efficiently perform a process of leading insights from a scatter diagram showing the results of the correspondence analysis related to the first item and the second item.

根據所述第4形態，利用者可使用在散佈圖內原點附近的第1項目不具有顯著的特徵這一知識，有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。According to the fourth aspect, the user can use the knowledge that the first item near the origin in the scatter diagram does not have distinctive features, and efficiently perform the process of guiding insights from the graph showing the results of the corresponding analysis.

根據所述第5形態，利用者可觀看所圖示的範圍，而容易地知道不具有顯著的特徵的第1項目。According to the fifth aspect, the user can view the illustrated range and easily know the first item that does not have a distinctive feature.

根據所述第6形態，利用者可使用在散佈圖內位於從原點向第2項目離去的方向上的第1項目對所述第2項目賦予特徵這一知識，有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。According to the sixth aspect, the user can use the knowledge of the first item located in the direction of departure from the origin to the second item in the scatter plot to characterize the second item and efficiently perform the following The graphs corresponding to the results of the analysis lead to the processing of insights.

根據所述第7形態，利用者可觀看所圖示的範圍，而容易地知道對所選擇的第2項目賦予特徵的第1項目。According to the seventh aspect, the user can view the illustrated range and easily know the first item that characterizes the selected second item.

根據所述第8形態，利用者可使用在散佈圖內距離近的第1項目彼此的類似度高這一知識，有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。According to the eighth aspect, the user can use the knowledge that the first items that are close in the scatter diagram have high similarity to each other, and efficiently perform the process of guiding insights from the graph showing the results of the corresponding analysis.

根據所述第9形態，利用者可觀看所圖示的範圍，而容易地知道與所選擇的第1項目的類似度高的第1項目。According to the ninth aspect, the user can view the illustrated range and easily know the first item having a high degree of similarity to the selected first item.

根據所述第10形態，利用者可使用在散佈圖內距離近的第2項目彼此的類似度高這一知識，有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。According to the tenth aspect, the user can use the knowledge that the second items in the scatter diagram that are close to each other have a high degree of similarity, and efficiently perform a process of guiding insights from the graph showing the results of the corresponding analysis.

根據所述第11形態，利用者可觀看所圖示的符號，而容易地知道與所選擇的第2項目的類似度最高的第2項目。According to the eleventh aspect, the user can view the illustrated symbol and easily know the second item with the highest similarity to the selected second item.

根據所述第12形態或第18形態，利用者可有效率地進行從表示與單詞及文章的部分相關的對應分析的結果的散佈圖中引導出見解的處理。According to the twelfth aspect or the eighteenth aspect, the user can efficiently perform a process of guiding insights from a scatter diagram showing the results of correspondence analysis related to words and text parts.

本發明的這些目的、特徵、形態及效果及其他目的、特徵、形態及效果將參照隨附圖式並根據以下的詳細的說明而變得更明確。These objects, features, forms, and effects of the present invention and other objects, features, forms, and effects will become clearer with reference to the accompanying drawings and the following detailed description.

以下，參照圖式對本發明的實施形態的文本挖掘支援方法、文本挖掘支援裝置、及文本挖掘支援程式進行說明。本實施形態的文本挖掘支援方法典型的是使用電腦來執行。本實施形態的文本挖掘支援裝置典型的是使用電腦來構成。本實施形態的文本挖掘支援程式是用以使用電腦實施文本挖掘支援方法的程式。執行文本挖掘支援程式的電腦作為文本挖掘支援裝置發揮功能。Hereinafter, a text mining support method, a text mining support device, and a text mining support program according to an embodiment of the present invention will be described with reference to the drawings. The text mining support method of this embodiment is typically executed using a computer. The text mining support device of this embodiment is typically constructed using a computer. The text mining support program of this embodiment is a program for implementing a text mining support method using a computer. The computer that executes the text mining support program functions as a text mining support device.

圖1是表示本發明的實施形態的文本挖掘支援裝置的構成的方塊圖。圖1中所示的文本挖掘支援裝置10具備分析結果輸入部11、指示輸入部12、畫面生成部13、及分析結果顯示部14。向文本挖掘支援裝置10中輸入對文本資料進行對應分析的結果。文本挖掘支援裝置10將表示所輸入的分析結果的散佈圖顯示在畫面中。FIG. 1 is a block diagram showing the structure of a text mining support device according to an embodiment of the present invention. The text mining support device 10 shown in FIG. 1 includes an analysis result input unit 11, an instruction input unit 12, a screen generation unit 13, and an analysis result display unit 14. The result of the corresponding analysis on the text data is input to the text mining support device 10. The text mining support device 10 displays a scatter diagram showing the input analysis result on the screen.

在圖1中，在文本挖掘支援裝置10的前段設置有文本分析裝置5。向文本分析裝置5中輸入文本資料1。在以下的說明中，將文本資料1設為具有多個部分（以下，稱為“章”）的文章資料。另外，在進行對應分析的場景中也將“章”稱為“變數”。文本分析裝置5抽出文本資料1中所含有的單詞，並製作將單詞作為表側項目，將章作為表頭項目，將各章中的各單詞的出現頻率作為表內資料的複合表。文本分析裝置5對所製作的複合表進行對應分析，並輸出分析結果2。若進行對應分析，則可獲得表示處理對象資料的特徵的2個以上的成分。在分析結果2中，至少包含各單詞的第1成分及第2成分、各變數的第1成分及第2成分、第1成分的貢獻率、以及第2成分的貢獻率。In FIG. 1, a text analysis device 5 is provided in front of the text mining support device 10. The text data 1 is input into the text analysis device 5. In the following description, the text material 1 is an article material having multiple parts (hereinafter referred to as "chapters"). In addition, in the scenario where correspondence analysis is performed, the “chapter” is also called a “variable”. The text analysis device 5 extracts the words contained in the text data 1, and creates a compound table with the words as the table side items, the chapters as the table head items, and the appearance frequency of each word in each chapter as the data in the table. The text analysis device 5 performs corresponding analysis on the created composite table, and outputs the analysis result 2. By performing the correspondence analysis, two or more components representing the characteristics of the processing target data can be obtained. The analysis result 2 includes at least the first and second components of each word, the first and second components of each variable, the contribution rate of the first component, and the contribution rate of the second component.

圖2是表示成為對應分析的對象的複合表的圖。圖2中所示的複合表通過將小說“人類失格”的文章資料作為文本資料1輸入至文本分析裝置5中來製作。此小說是日本的小說，具有“序言”、“第一手記”、“第二手記”、“第三手記”及“後記”這5章，包含“自己”、“人類”、“比目魚”、“心情”等單詞。圖2中所示的複合表包含“自己”、“人類”、“比目魚”、“心情”等單詞作為表側項目，並包含“序言”、“第一手記”、“第二手記”、“第三手記”及“後記”這5個變數（章）作為表頭項目。在“第一手記”中出現單詞“人類”38次。對應於此，在圖2中所示的複合表中，在表側項目為“人類”、表頭項目為“第一手記”一欄（斜線部）中記載有38。再者，為了適宜地進行對應分析，在圖2中所示的複合表中僅包含具有規定以上的出現頻率的單詞。FIG. 2 is a diagram showing a composite table that is an object of correspondence analysis. The composite table shown in FIG. 2 is created by inputting the article material of the novel “Human Disqualification” as text material 1 into the text analysis device 5. This novel is a Japanese novel with 5 chapters of "Preface", "First Notes", "Second Notes", "Third Notes" and "Postscript", including "Self", "Human" and "Halibut" , "Mood" and other words. The compound table shown in FIG. 2 contains words such as "self", "human", "flatfish", and "mood" as table-side items, and contains "preface", "first note", "second note", " The five variables (chapters) of the "third note" and "postscript" are used as the header items. The word "human" appears 38 times in the "first note". Corresponding to this, in the compound table shown in FIG. 2, 38 is written in the column (slashed part) in which the front-side item is “human” and the head item is “first note”. In addition, in order to appropriately perform correspondence analysis, the compound table shown in FIG. 2 includes only words having a frequency of occurrence that is more than a predetermined frequency.

圖3是表示由文本挖掘支援裝置10所製作的散佈圖的圖。如上所述，在輸入至文本挖掘支援裝置10中的分析結果2中，至少包含各單詞的第1成分及第2成分、各變數的第1成分及第2成分、第1成分的貢獻率、以及第2成分的貢獻率。畫面生成部13在將第1成分作為橫軸，將第2成分作為縱軸的平面內，對單詞與變數進行繪圖，由此製作散佈圖。例如，根據關於圖2中所示的複合表的分析結果2，制作圖3中所示的散佈圖。分析結果顯示部14顯示包含所製作的散佈圖的畫面。FIG. 3 is a diagram showing a scattergram created by the text mining support device 10. As described above, the analysis result 2 input to the text mining support device 10 includes at least the first and second components of each word, the first and second components of each variable, and the contribution rate of the first component, And the contribution rate of the second component. The screen generating unit 13 plots words and variables in a plane with the first component as the horizontal axis and the second component as the vertical axis, thereby creating a scatter diagram. For example, based on the analysis result 2 regarding the composite table shown in FIG. 2, the scatter diagram shown in FIG. 3 is produced. The analysis result display unit 14 displays a screen including the created scattergram.

在圖3中，在單詞的位置上記載有塗黑的圓，在變數的位置上記載有中空的正方形，單詞為標準體，變數是以斜體來記載。在圖3中記載有第1成分的貢獻率與第2成分的貢獻率。通常，第1成分的貢獻率大於第2成分的貢獻率。考慮到此點，散佈圖內的2點P（p₁ 、p₂ ）、Q（q₁ 、q₂ ）間的距離d是使用第1成分的貢獻率k₁ 與第2成分的貢獻率k₂ 而如下式（1）般來定義。 d＝√[{k₁ （p₁ －q₁ ）}² ＋{k₂ （p₂ －q₂ ）}² ]…（1）以下的說明中的距離是指由式（1）所定義的散佈圖內的距離。散佈圖內所記載的圓看上去是第1成分方向的長度比第2成分方向的長度短的橢圓。In FIG. 3, a black circle is written at the position of the word, a hollow square is written at the position of the variable, the word is a standard body, and the variable is written in italics. FIG. 3 shows the contribution rate of the first component and the contribution rate of the second component. Generally, the contribution rate of the first component is greater than the contribution rate of the second component. Considering this point, the distance d between the two points P (p ₁ , p ₂ ) and Q (q ₁ , q ₂ ) in the scatter diagram is the contribution rate k ₁ using the first component and the contribution rate k of the second component _{2. It} is defined as follows (1). d=√[{k ₁ (p ₁ -q ₁ )} ² +{k ₂ (p ₂ -q ₂ )} ² ]…(1) The distance in the following description refers to the definition by formula (1) The distance within the scatter plot. The circle described in the scatter diagram looks like an ellipse whose length in the first component direction is shorter than the length in the second component direction.

圖4是表示作為文本挖掘支援裝置10發揮功能的電腦的構成的方塊圖。圖4中所示的電腦20具備中央處理單元（Central Processing Unit，CPU）21、主記憶體22、記憶部23、輸入部24、顯示部25、通訊部26、及記錄媒體讀取部27。在主記憶體22中，例如使用動態隨機存取記憶體（Dynamic Random Access Memory，DRAM）。在記憶部23中，例如使用硬盤或固態驅動器。在輸入部24中，例如包含鍵盤28或滑鼠29。在顯示部25中，例如使用液晶顯示器。通訊部26是有線通訊或無線通訊的介面電路（interface circuit）。記錄媒體讀取部27是記憶程式等的記錄媒體30的介面電路。在記錄媒體30中，例如使用光碟唯讀記憶體（Compact Disc-Read Only Memory，CD-ROM）、數位影音光碟唯讀記憶體（Digital Video Disc-Read Only Memory，DVD-ROM）等非一時性的記錄媒體。FIG. 4 is a block diagram showing the configuration of a computer that functions as the text mining support device 10. The computer 20 shown in FIG. 4 includes a central processing unit (Central Processing Unit, CPU) 21, a main memory 22, a memory unit 23, an input unit 24, a display unit 25, a communication unit 26, and a recording medium reading unit 27. In the main memory 22, for example, dynamic random access memory (Dynamic Random Access Memory, DRAM) is used. In the memory section 23, for example, a hard disk or a solid-state drive is used. The input unit 24 includes, for example, a keyboard 28 or a mouse 29. For the display unit 25, for example, a liquid crystal display is used. The communication section 26 is an interface circuit for wired communication or wireless communication. The recording medium reading unit 27 is an interface circuit for recording media 30 such as a program. In the recording medium 30, for example, non-temporary use of compact disc-read only memory (CD-ROM), digital video disc-read only memory (DVD-ROM), etc. Recording media.

當電腦20執行文本挖掘支援程式31時，記憶部23記憶文本挖掘支援程式31與分析結果2。文本挖掘支援程式31與分析結果2例如可以是使用通訊部26從服務器或其他電腦所接收者，也可以是使用記錄媒體讀取部27從記錄媒體30中讀出者。When the computer 20 executes the text mining support program 31, the memory unit 23 stores the text mining support program 31 and the analysis result 2. The text mining support program 31 and the analysis result 2 may be received from a server or another computer using the communication unit 26, or may be read from the recording medium 30 using the recording medium reading unit 27, for example.

當執行文本挖掘支援程式31時，文本挖掘支援程式31與分析結果2被複製轉送至主記憶體22中。CPU 21將主記憶體22用作作業用記憶體，執行記憶在主記憶體22中的文本挖掘支援程式31，由此對記憶在主記憶體22中的分析結果2進行處理。此時，電腦20作為文本挖掘支援裝置10發揮功能。再者，以上所述的電腦20的構成只不過是一例，可使用任意的電腦構成文本挖掘支援裝置10。When the text mining support program 31 is executed, the text mining support program 31 and the analysis result 2 are copied and transferred to the main memory 22. The CPU 21 uses the main memory 22 as a working memory, and executes the text mining support program 31 stored in the main memory 22, thereby processing the analysis result 2 stored in the main memory 22. At this time, the computer 20 functions as the text mining support device 10. In addition, the configuration of the computer 20 described above is just an example, and any computer can be used to configure the text mining support device 10.

具有與文本挖掘相關的知識或經驗的利用者對於表示對應分析的結果的散佈圖，具有如下的知識。具有知識或經驗的利用者可使用這些知識而從散佈圖中引導出見解。第1個知識“原點附近的單詞不具有顯著的特徵” 第2個知識“位於從原點向變數離去的方向上的單詞與所述變數的關聯度高，對所述變數賦予特徵” 第3個知識“距離近的單詞彼此的類似度高” 第4個知識“距離近的變數彼此的類似度高”A user who has knowledge or experience related to text mining has the following knowledge about a scatter diagram representing the results of correspondence analysis. Users with knowledge or experience can use this knowledge to guide insights from the scatter diagram. The first knowledge "Words near the origin do not have significant features" The second knowledge "Words located in the direction away from the origin to the variable have a high degree of correlation with the variable and give the variable a feature" The third knowledge "close words have a high similarity to each other" The fourth knowledge "close variables have a high similarity to each other"

另一方面，與文本挖掘相關的知識或經驗少的利用者不具有如上所述的知識。因此，知識或經驗少的利用者無法有效率地進行從散佈圖中引導出見解的處理。為了解決所述問題，文本挖掘支援裝置10不僅將包含散佈圖的畫面作為基本畫面來進行顯示，而且對應於來自利用者的指示，將包含散佈圖與表示散佈圖的看法的啟示(hint)的畫面作為支援畫面來進行顯示。On the other hand, users who have little knowledge or experience related to text mining do not have the knowledge described above. Therefore, a user with little knowledge or experience cannot efficiently perform the process of guiding insights from the scatter diagram. In order to solve the above problem, the text mining support device 10 not only displays a screen including a scatter diagram as a basic screen, but also includes a scatter diagram and a hint that represents the scatter diagram in response to an instruction from a user. The screen is displayed as a support screen.

參照圖1對文本挖掘支援裝置10的各部的動作進行說明。向分析結果輸入部11中輸入從外部的裝置（例如文本分析裝置5）所輸出的分析結果2。向指示輸入部12中輸入來自利用者的指示。畫面生成部13製作表示分析結果2的散佈圖，並生成包含散佈圖的畫面的畫面資料。畫面生成部13對應於使用指示輸入部12所輸入的來自利用者的指示，選擇性地生成包含散佈圖及啟示的支援畫面的畫面資料、及包含散佈圖且不含啟示的基本畫面的畫面資料。分析結果顯示部14根據由畫面生成部13所生成的畫面資料來顯示畫面。以下，將由文本挖掘支援裝置10所顯示的支援畫面設為4種，並將4種支援畫面稱為第1支援畫面～第4支援畫面。The operation of each part of the text mining support device 10 will be described with reference to FIG. 1. The analysis result 2 output from an external device (for example, the text analysis device 5) is input to the analysis result input unit 11. An instruction from the user is input into the instruction input unit 12. The screen generating unit 13 creates a scatter diagram showing the analysis result 2 and generates screen data of the screen including the scatter diagram. The screen generating unit 13 selectively generates screen data including a scatter diagram and a support screen of the enlightenment and screen data including a scatter diagram and a basic screen without enlightenment in response to the instruction input from the user using the instruction input unit 12 . The analysis result display unit 14 displays the screen based on the screen data generated by the screen generation unit 13. Hereinafter, four types of support screens displayed by the text mining support device 10 will be referred to as the first to fourth support screens.

圖5是表示文本挖掘支援裝置10的動作的流程圖。首先，CPU 21將從文本分析裝置5中輸出的分析結果2轉送至主記憶體22中。由此，向文本挖掘支援裝置10中輸入分析結果2（步驟S101）。繼而，CPU 21根據分析結果2來製作散佈圖（步驟S102）。散佈圖通過在將第1成分作為橫軸，將第2成分作為縱軸的平面內，對單詞與變數進行繪圖來製作。繼而，CPU 21生成包含步驟S102中所製作的散佈圖的基本畫面的畫面資料（步驟S103）。繼而，CPU 21根據步驟S103中所生成的畫面資料，使基本畫面顯示在顯示部25中（步驟S104）。FIG. 5 is a flowchart showing the operation of the text mining support device 10. First, the CPU 21 transfers the analysis result 2 output from the text analysis device 5 to the main memory 22. Thus, the analysis result 2 is input to the text mining support device 10 (step S101). Then, the CPU 21 creates a scatter diagram based on the analysis result 2 (step S102). The scatter plot is created by plotting words and variables in a plane with the first component as the horizontal axis and the second component as the vertical axis. Then, the CPU 21 generates screen data including the basic screen of the scattergram created in step S102 (step S103). Then, the CPU 21 displays the basic screen on the display unit 25 based on the screen data generated in step S103 (step S104).

圖6是表示基本畫面的圖。圖6中所示的基本畫面100包含畫面選擇窗口101與散佈圖窗口102。在散佈圖窗口102中記載圖3中所示的散佈圖。畫面選擇窗口101具有6個單選按鈕103。以下，將6個單選按鈕103稱為第1單選按鈕～第6單選按鈕。第1單選按鈕～第6單選按鈕分別與基本畫面、第1支援畫面～第4支援畫面、及結束建立對應。當顯示基本畫面100時，利用者操作鍵盤28或滑鼠29，按下第1單選按鈕～第6單選按鈕中的任一個。由此，輸入來自利用者的指示。6 is a diagram showing a basic screen. The basic screen 100 shown in FIG. 6 includes a screen selection window 101 and a scatter diagram window 102. The scatter diagram shown in FIG. 3 is described in the scatter diagram window 102. The screen selection window 101 has six radio buttons 103. Hereinafter, the six radio buttons 103 are referred to as the first radio button to the sixth radio button. The first radio button to the sixth radio button are respectively associated with the base screen, the first support screen to the fourth support screen, and the end. When the basic screen 100 is displayed, the user operates the keyboard 28 or the mouse 29 and presses any one of the first radio button to the sixth radio button. Thus, an instruction from the user is input.

CPU 21接受使用畫面選擇窗口101所輸入的來自利用者的指示（步驟S105）。繼而，CPU 21對應於來自利用者的指示，進入至以下的任一個步驟（步驟S106）。當來自利用者的指示為“基本畫面”時（第1單選按鈕被按下時），CPU 21進入至步驟S107。在此情況下，CPU 21與步驟S103同樣地生成基本畫面的畫面資料（步驟S107）。當來自利用者的指示為“第1支援畫面”時（第2單選按鈕被按下時），CPU 21進入至步驟S108。在此情況下，CPU 21生成第1支援畫面的畫面資料（步驟S108）。當來自利用者的指示為“第2支援畫面”時（第3單選按鈕被按下時），CPU 21進入至步驟S109。在此情況下，CPU 21生成第2支援畫面的畫面資料（步驟S109）。當來自利用者的指示為“第3支援畫面”時（第4單選按鈕被按下時），CPU 21進入至步驟S110。在此情況下，CPU 21生成第3支援畫面的畫面資料（步驟S110）。當來自利用者的指示為“第4支援畫面”時（第5單選按鈕被按下時），CPU 21進入至步驟S111。在此情況下，CPU 21生成第4支援畫面的畫面資料（步驟S111）。當來自利用者的指示為“結束”時（第6單選按鈕被按下時），CPU 21結束處理。The CPU 21 accepts an instruction from the user input in the use screen selection window 101 (step S105). Then, in response to an instruction from the user, the CPU 21 proceeds to any one of the following steps (step S106). When the instruction from the user is "basic screen" (when the first radio button is pressed), the CPU 21 proceeds to step S107. In this case, the CPU 21 generates screen data of the basic screen as in step S103 (step S107). When the instruction from the user is "first support screen" (when the second radio button is pressed), the CPU 21 proceeds to step S108. In this case, the CPU 21 generates screen data of the first support screen (step S108). When the instruction from the user is "second support screen" (when the third radio button is pressed), the CPU 21 proceeds to step S109. In this case, the CPU 21 generates screen data of the second support screen (step S109). When the instruction from the user is "third support screen" (when the fourth radio button is pressed), the CPU 21 proceeds to step S110. In this case, the CPU 21 generates screen data of the third support screen (step S110). When the instruction from the user is "the fourth support screen" (when the fifth radio button is pressed), the CPU 21 proceeds to step S111. In this case, the CPU 21 generates screen data of the fourth support screen (step S111). When the instruction from the user is "end" (when the sixth radio button is pressed), the CPU 21 ends the processing.

CPU 21執行步驟S107～步驟S111的任一者後，進入至步驟S112。繼而，CPU 21根據步驟S107～步驟S111的任一者中所生成的畫面資料，使畫面顯示在顯示部25中（步驟S112）。繼而，CPU 21進入至步驟S105。如此，文本挖掘支援裝置10對應於來自利用者的指示，顯示選自基本畫面與第1支援畫面～第4支援畫面中的畫面。After the CPU 21 executes any one of steps S107 to S111, the process proceeds to step S112. Then, the CPU 21 displays the screen on the display unit 25 based on the screen data generated in any one of steps S107 to S111 (step S112). Then, the CPU 21 proceeds to step S105. In this way, the text mining support device 10 displays a screen selected from the basic screen and the first to fourth support screens in response to an instruction from the user.

再者，圖4中所示的電腦20的構成要素及圖5中所示的步驟與圖1中所示的文本挖掘支援裝置10的構成要素如以下般進行對應。執行步驟S101的CPU 21作為分析結果輸入部11發揮功能。輸入部24及執行步驟S105的CPU 21作為指示輸入部12發揮功能。執行步驟S102～步驟S103、步驟S106～步驟S111的CPU 21作為畫面生成部13發揮功能。顯示部25及執行步驟S104、步驟S112的CPU 21作為分析結果顯示部14發揮功能。In addition, the constituent elements of the computer 20 shown in FIG. 4 and the steps shown in FIG. 5 correspond to the constituent elements of the text mining support device 10 shown in FIG. 1 as follows. The CPU 21 executing step S101 functions as the analysis result input unit 11. The input unit 24 and the CPU 21 executing step S105 function as the instruction input unit 12. The CPU 21 that executes steps S102 to S103 and steps S106 to S111 functions as the screen generator 13. The display unit 25 and the CPU 21 that executes steps S104 and S112 function as the analysis result display unit 14.

圖7是表示第1支援畫面的圖。圖7中所示的第1支援畫面110包含畫面選擇窗口101、散佈圖窗口112、單詞清單窗口113、及啟示窗口(hint window)114。第1支援畫面110與第1個知識“原點附近的單詞不具有顯著的特徵”相關。利用者可觀看第1支援畫面110，使用第1個知識有效率地進行從散佈圖中引導出見解的處理。7 is a diagram showing a first support screen. The first support screen 110 shown in FIG. 7 includes a screen selection window 101, a scatter diagram window 112, a word list window 113, and a hint window 114. The first support screen 110 is related to the first knowledge "Words near the origin have no distinctive features". The user can view the first support screen 110 and use the first knowledge to efficiently perform the process of guiding insights from the scatter diagram.

在顯示第1支援畫面110前，利用者操作鍵盤28或滑鼠29，指定判斷為原點附近的範圍。判斷為原點附近的範圍的初始值也可以事先決定。在散佈圖窗口112中記載圖3中所示的散佈圖。在散佈圖窗口112內的散佈圖中記載表示原點附近的圓115（外觀為橢圓）。圓115優選以與散佈圖不同的顏色（例如紅色）來記載。如此，在第1支援畫面110中所含有的散佈圖中，原點附近的範圍使用圓115來圖示。因此，利用者可觀看所圖示的範圍，而容易地知道不具有顯著的特徵的單詞。Before displaying the first support screen 110, the user operates the keyboard 28 or the mouse 29 to designate the range determined to be near the origin. The initial value of the range judged to be near the origin may be determined in advance. The scatter diagram shown in FIG. 3 is described in the scatter diagram window 112. In the scatter diagram in the scatter diagram window 112, a circle 115 (the appearance is an ellipse) near the origin is described. The circle 115 is preferably written in a different color (for example, red) from the scatter diagram. In this way, in the scatter diagram included in the first support screen 110, the range around the origin is shown by a circle 115. Therefore, the user can view the illustrated range and easily know words that do not have distinctive features.

在單詞清單窗口113中記載將位於原點附近的單詞（圓115內的單詞）、及所述單詞與原點的距離按距離從近至遠的順序排列的單詞清單。單詞清單窗口113內的向上的三角形表示按距離從近至遠的順序排列。在啟示窗口114中，附加“分析的要點”這一標題來記載第1個知識。啟示窗口114配置在與散佈圖窗口112重疊的位置上。In the word list window 113, a word list in which words located near the origin (words in the circle 115) and distances between the words and the origin are arranged in order from shortest to farthest. The upward triangles in the word list window 113 indicate that they are arranged in order from shortest to farthest. In the enlightenment window 114, the title of "points of analysis" is added to record the first knowledge. The inspiration window 114 is arranged at a position overlapping with the scatter diagram window 112.

圓115的尺寸通過任意的方法來決定。例如，可通過利用者指定圓115中所含有的單詞的個數（例如10個）來決定圓115的尺寸。或者，也可以通過利用者指定圓115中所含有的單詞的比例（例如整體的10%）來決定圓115的尺寸。或者，也可以通過利用者使用滑鼠29在第1支援畫面110內指定與原點的距離來決定圓115的尺寸。The size of the circle 115 is determined by any method. For example, the size of the circle 115 can be determined by the user specifying the number of words contained in the circle 115 (for example, 10). Alternatively, the size of the circle 115 may be determined by the user specifying the proportion of words contained in the circle 115 (for example, 10% of the whole). Alternatively, the user may use the mouse 29 to specify the distance from the origin in the first support screen 110 to determine the size of the circle 115.

在圖7中所示的第1支援畫面110中，原點附近的單詞（圓115內的單詞）以與其他單詞相同的形態來顯示。作為替代，也可以在第1支援畫面中，以與其他單詞不同的形態（例如以淡的顏色）顯示原點附近的單詞，也可以不顯示原點附近的單詞。在第2支援畫面～第4支援畫面中，也能夠以與其他單詞不同的形態顯示在第1支援畫面中以與其他單詞不同的形態顯示的單詞，也可以不顯示在第1支援畫面中未顯示的單詞。In the first support screen 110 shown in FIG. 7, words near the origin (words in the circle 115) are displayed in the same form as other words. Alternatively, the words near the origin may be displayed in a form different from other words (for example, in a light color) on the first support screen, or the words near the origin may not be displayed. From the second support screen to the fourth support screen, the words displayed in the form different from other words on the first support screen may be displayed in a form different from other words, or may not be displayed on the first support screen. The word displayed.

圖8是表示第2支援畫面的圖。圖8中所示的第2支援畫面120包含畫面選擇窗口101、散佈圖窗口122、單詞清單窗口123、及啟示窗口124。第2支援畫面120與第2個知識“位於從原點向變數離去的方向上的單詞與所述變數的關聯度高，對所述變數賦予特徵”相關。利用者可觀看第2支援畫面120，使用第2個知識有效率地進行從散佈圖中引導出見解的處理。8 is a diagram showing a second support screen. The second support screen 120 shown in FIG. 8 includes a screen selection window 101, a scatter diagram window 122, a word list window 123, and an enlightenment window 124. The second support screen 120 is related to the second knowledge "words located in the direction away from the origin to the variable have a high degree of correlation with the variable, and the feature is given to the variable". The user can view the second support screen 120 and use the second knowledge to efficiently perform the process of guiding insights from the scatter diagram.

在顯示第2支援畫面120前，利用者操作鍵盤28或滑鼠29選擇1個變數（章）。此處，對選擇了變數“序言”的情況進行說明。在散佈圖窗口122中記載圖3中所示的散佈圖。在散佈圖窗口122內的散佈圖中，記載以原點為起點並穿過所選擇的變數的箭頭125，及以原點為起點，在與箭頭125之間形成規定角度（例如10°）的角的2條半直線126、半直線127。在由半直線126、半直線127包夾的區域內，存在位於從原點向所選擇的變數離去的方向上的單詞。如此，在第2支援畫面120中所含有的散佈圖中，從原點向所選擇的變數離去的方向的範圍使用半直線126、半直線127來圖示。因此，利用者可觀看所圖示的範圍，容易地知道對所選擇的變數賦予特徵的單詞。Before displaying the second support screen 120, the user operates the keyboard 28 or the mouse 29 to select one variable (chapter). Here, the case where the variable "preamble" is selected will be described. The scatter diagram shown in FIG. 3 is described in the scatter diagram window 122. In the scatter diagram in the scatter diagram window 122, an arrow 125 starting from the origin and passing through the selected variable is described, and a predetermined angle (eg 10°) is formed between the arrow 125 and the origin as the starting point Two semi-straight lines 126 and 127 at the angle. In the area enclosed by the semi-straight line 126 and the semi-straight line 127, there is a word located in a direction away from the origin to the selected variable. In this manner, in the scatter diagram included in the second support screen 120, the range from the origin to the direction in which the selected variable deviates is illustrated by the semi-straight line 126 and the semi-straight line 127. Therefore, the user can view the illustrated range and easily know the word that characterizes the selected variable.

在單詞清單窗口123中，記載將位於從原點向所選擇的變數離去的方向上的單詞（由半直線126、半直線127包夾的區域內的單詞）、及所述單詞與原點的距離按距離從遠至近的順序排列的單詞清單。單詞清單窗口123內的向下的三角形表示按距離從遠至近的順序排列。在單詞清單窗口123中，與第2個知識相關聯，記載“可判斷為與原點的距離越遠，關聯度越高”。在啟示窗口124中，附加“分析的要點”這一標題來記載第2個知識。啟示窗口124配置在與散佈圖窗口122重疊的位置上。In the word list window 123, words (words in the area enclosed by the semi-straight line 126 and the semi-straight line 127) located in the direction away from the origin to the selected variable, and the words and the origin are described The distance is in the order of distance from near to near. The downward triangles in the word list window 123 indicate that they are arranged in order from farthest to closest. In the word list window 123, the second knowledge is associated, and it is described that "the farther the distance from the origin can be determined, the higher the degree of association". In the enlightenment window 124, the title of "analysis points" is added to record the second knowledge. The inspiration window 124 is arranged at a position overlapping the scatter diagram window 122.

只要箭頭125與半直線126、半直線127包含在相同的象限中，則箭頭125與半直線126、半直線127所形成的角的角度可通過任意的方法來決定。當提供箭頭125與角度來記載半直線126、半直線127時，在半直線126、半直線127包含在與箭頭125不同的象限中的情況下，將半直線126、半直線127記載在第1成分軸或第2成分軸上。箭頭125優選以與散佈圖不同的顏色（例如紅色）來記載。半直線126、半直線127優選以與散佈圖及箭頭125不同的顏色（例如藍色）來記載。As long as the arrow 125 and the half line 126 and the half line 127 are included in the same quadrant, the angle formed by the arrow 125 and the half line 126 and the half line 127 can be determined by any method. When the arrow 125 and the angle are provided to describe the half line 126 and the half line 127, when the half line 126 and the half line 127 are included in a quadrant different from the arrow 125, the half line 126 and the half line 127 are described in the first On the component axis or the second component axis. The arrow 125 is preferably written in a different color (for example, red) from the scatter diagram. The half-line 126 and the half-line 127 are preferably described in a different color (for example, blue) from the scatter diagram and arrow 125.

圖9是表示第3支援畫面的圖。圖9中所示的第3支援畫面130包含畫面選擇窗口101、散佈圖窗口132、單詞清單窗口133、及啟示窗口134。第3支援畫面130與第3個知識“距離近的單詞彼此的類似度高”相關。利用者可觀看第3支援畫面130，使用第3個知識有效率地進行從散佈圖中引導出見解的處理。9 is a diagram showing a third support screen. The third support screen 130 shown in FIG. 9 includes a screen selection window 101, a scatter diagram window 132, a word list window 133, and an inspiration window 134. The third support screen 130 is related to the third knowledge “words that are close to each other have a high degree of similarity”. The user can view the third support screen 130 and use the third knowledge to efficiently perform the process of guiding insights from the scatter diagram.

在顯示第3支援畫面130前，利用者操作鍵盤28或滑鼠29選擇1個單詞，並指定判斷為所選擇的單詞的附近的範圍。此處，對選擇了單詞“眼”的情況進行說明。在散佈圖窗口132中記載圖3中所示的散佈圖。在散佈圖窗口132內的散佈圖中記載表示所選擇的單詞的附近的圓135（外觀為橢圓）。圓135優選以與散佈圖不同的顏色（例如紅色）來記載。如此，在第3支援畫面130中所含有的散佈圖中，所選擇的單詞的附近的範圍使用圓135來圖示。因此，利用者可觀看所圖示的範圍，容易地知道與所選擇的單詞的類似度高的單詞。Before displaying the third support screen 130, the user operates the keyboard 28 or the mouse 29 to select one word, and designates a range around the determined word. Here, the case where the word "eye" is selected will be described. The scatter diagram shown in FIG. 3 is described in the scatter diagram window 132. In the scatter diagram in the scatter diagram window 132, a circle 135 (the appearance is an ellipse) indicating the vicinity of the selected word is described. The circle 135 is preferably written in a different color (for example, red) from the scatter diagram. In this way, in the scatter diagram included in the third support screen 130, the range around the selected word is illustrated by a circle 135. Therefore, the user can view the illustrated range and easily know the word having a high degree of similarity to the selected word.

在單詞清單窗口133中記載將位於所選擇的單詞的附近的單詞（圓135內的單詞）、及所述單詞與所指定的單詞的距離按距離從近至遠的順序排列的單詞清單。在單詞清單窗口133中，記載“可判斷為與單詞的距離越近，類似度越高”作為第3個知識。在此例中，與所選擇的單詞“眼”的距離最近的單詞是“臉”。因此，與所選擇的單詞“眼”的類似度最高的單詞是“臉”。在啟示窗口134中，附加“分析的要點”這一標題來記載所述意思。啟示窗口134配置在與散佈圖窗口132重疊的位置上。In the word list window 133, a word list in which words located in the vicinity of the selected word (words in the circle 135) and the distance between the word and the specified word are arranged in order from the shortest to the farthest is described. In the word list window 133, "the closer the distance to the word can be determined, the higher the similarity" is described as the third knowledge. In this example, the word closest to the selected word "eye" is "face". Therefore, the word with the highest similarity to the selected word "eye" is "face". In the enlightenment window 134, the title of "points of analysis" is added to describe the meaning. The inspiration window 134 is arranged at a position overlapping with the scatter diagram window 132.

圓135的尺寸與第1支援畫面110內的圓115的尺寸同樣地，通過任意的方法來決定。例如，利用者通過指定圓135中所含有的單詞的個數的方法、指定圓135中所含有的單詞的比例的方法、指定與所選擇的單詞的距離的方法等來決定圓135的尺寸。The size of the circle 135 is determined by an arbitrary method in the same way as the size of the circle 115 in the first support screen 110. For example, the user determines the size of the circle 135 by a method of specifying the number of words contained in the circle 135, a method of specifying the ratio of the words contained in the circle 135, and a method of specifying the distance from the selected word.

圖10是表示第4支援畫面的圖。圖10中所示的第4支援畫面140包含畫面選擇窗口101、散佈圖窗口142、變數清單窗口143、及啟示窗口144。第4支援畫面140與第4個知識“距離近的變數彼此的類似度高”相關。利用者可觀看第4支援畫面140，使用第4個知識有效率地進行從散佈圖中引導出見解的處理。10 is a diagram showing a fourth support screen. The fourth support screen 140 shown in FIG. 10 includes a screen selection window 101, a scatter diagram window 142, a variable list window 143, and an inspiration window 144. The fourth support screen 140 is related to the fourth knowledge “the variables with close distances have a high similarity”. The user can view the fourth support screen 140 and use the fourth knowledge to efficiently perform the process of guiding insights from the scatter diagram.

在顯示第4支援畫面140前，利用者操作鍵盤28或滑鼠29選擇1個變數。此處，對選擇變數“序言”的情況進行說明。在散佈圖窗口142中記載圖3中所示的散佈圖。在散佈圖窗口142內的散佈圖中，記載將所選擇的變數作為起點，並將與所選擇的變數的距離最近的變數作為終點的箭頭145。箭頭145優選以與散佈圖不同的顏色（例如紅色）來記載。如此，在第4支援畫面140中所含有的散佈圖中，圖示有表示與所選擇的變數的距離最近的變數的箭頭145。因此，利用者可觀看所圖示的箭頭145，容易地知道與所選擇的變數的類似度最高的變數。Before displaying the fourth support screen 140, the user operates the keyboard 28 or the mouse 29 to select one variable. Here, the case where the variable "preamble" is selected will be described. The scatter diagram shown in FIG. 3 is described in the scatter diagram window 142. In the scatter diagram in the scatter diagram window 142, an arrow 145 with the selected variable as the starting point and the variable closest to the selected variable as the end point is described. The arrow 145 is preferably written in a different color (for example, red) from the scatter diagram. In this manner, in the scatter diagram included in the fourth support screen 140, an arrow 145 indicating the variable closest to the selected variable is shown. Therefore, the user can view the illustrated arrow 145 and easily know the variable with the highest similarity to the selected variable.

變數清單窗口143記載將與所選擇的變數的距離比較近的變數及所述距離按距離從近至遠的順序排列的變數清單。在變數清單窗口143中，記載“可判斷為與變數的距離越近，類似度越高”作為第4個知識。在此例中，與所選擇的變數“序言”的距離最近的變數是“後記”。因此，與所選擇的變數“序言”的類似度最高的變數是“後記”。在啟示窗口144中，附加“分析的要點”這一標題來記載所述意思。啟示窗口144配置在與散佈圖窗口142重疊的位置上。The variable list window 143 describes variables that are closer in distance to the selected variable and a list of variables in which the distances are arranged in order from the closest to the farthest. In the variable list window 143, "the closer the distance to the variable can be determined, the higher the similarity" is described as the fourth knowledge. In this example, the variable closest to the selected variable "Preamble" is the "Postscript". Therefore, the variable with the highest similarity to the selected variable "Preamble" is the "Postscript". In the enlightenment window 144, the title of "points of analysis" is added to describe the meaning. The inspiration window 144 is arranged at a position overlapping with the scatter diagram window 142.

再者，文本挖掘支援裝置10也可以顯示以上所述的支援畫面以外的支援畫面。支援畫面只要包括散佈圖與表示散佈圖的看法的啟示，則可包含任意的內容。啟示可以是明示地表示散佈圖的看法者，也可以是暗示散佈圖的看法者。啟示也可以包含在支援畫面的任一部分中。啟示可以記載在與散佈圖窗口重疊的窗口中，也可以記載在與散佈圖窗口不重疊的窗口中，也可以記載在位置得到固定的消息框（message box）中。In addition, the text mining support device 10 may display a support screen other than the support screen described above. The support screen may include any content as long as it includes a scatter diagram and a revelation showing the views of the scatter diagram. The revelation can be an observer who expresses the scatter diagram explicitly or an observer who implies the scatter diagram. The revelation can also be included in any part of the support screen. The inspiration can be written in a window that overlaps the scatter diagram window, or in a window that does not overlap the scatter diagram window, or in a message box whose position is fixed.

如以上所示，本實施形態的文本挖掘支援方法包括：輸入分析結果2的步驟；輸入來自利用者的指示的步驟；生成包含表示分析結果2的圖表（散佈圖）的畫面的畫面資料的步驟；以及根據畫面資料，顯示畫面的步驟。生成畫面資料的步驟對應於指示，生成包含圖表與表示圖表的看法的啟示的支援畫面的畫面資料。因此，利用者可使用包含表示對應分析的結果的圖表與表示圖表的看法的啟示的支援畫面，有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。As described above, the text mining support method of the present embodiment includes the steps of inputting the analysis result 2; the step of inputting the instruction from the user; and the step of generating the screen data including the graph (scatter diagram) representing the analysis result 2 ; And the steps to display the screen based on the screen data. The step of generating screen data corresponds to an instruction, and generates screen data including a graph and a support screen that provides a revelation showing the view of the graph. Therefore, the user can use the support screen including the graph showing the result of the correspondence analysis and the inspiration showing the view of the graph to efficiently perform the process of guiding the insights from the graph showing the result of the correspondence analysis.

生成畫面資料的步驟生成從多個支援畫面（第1支援畫面110、第2支援畫面120、第3支援畫面130、第4支援畫面140）與包含圖表且不含啟示的基本畫面100之中，對應於指示所選擇的畫面的畫面資料。如此，選擇性地顯示包含啟示的支援畫面與不含啟示的基本畫面，由此可顯示對應於利用者的水平的畫面。另外，通過選擇性地顯示多個支援畫面，可對利用者提示多種圖表的看法。The step of generating screen data is generated from a plurality of support screens (first support screen 110, second support screen 120, third support screen 130, and fourth support screen 140) and a basic screen 100 including charts and without revelation, Corresponds to the screen data indicating the selected screen. In this way, the support screen including the inspiration and the basic screen without the inspiration are selectively displayed, whereby the screen corresponding to the level of the user can be displayed. In addition, by selectively displaying multiple support screens, users can be presented with a variety of chart views.

在輸入分析結果的步驟中，輸入將第1項目（單詞）與第2項目（變數）建立對應的結果，即包含第1項目的第1成分及第2成分與第2項目的第1成分及第2成分的結果作為分析結果2；生成畫面資料的步驟製作在將第1成分作為橫軸，將第2成分作為縱軸的平面內對第1項目與第2項目進行繪圖而成的散佈圖作為圖表。因此，利用者可有效率地進行從表示與第1項目與第2項目相關的對應分析的結果的散佈圖中引導出見解的處理。In the step of inputting the analysis result, enter the result of correlating the first item (word) and the second item (variable), that is, the first component and the second component of the first item and the first component of the second item and The result of the second component is the analysis result 2; the step of generating the screen data is to create a scatter plot of the first and second items in a plane with the first component as the horizontal axis and the second component as the vertical axis. As a chart. Therefore, the user can efficiently perform the process of leading insights from the scatter diagram showing the results of the correspondence analysis related to the first item and the second item.

多個支援畫面包括：第1支援畫面110，含有在散佈圖內原點附近的第1項目不具有顯著的特徵的意思作為啟示；第2支援畫面120，含有在散佈圖內位於從原點向第2項目離去的方向上的第1項目對所述第2項目賦予特徵的意思作為啟示；第3支援畫面130，含有在散佈圖內距離近的第1項目彼此的類似度高的意思作為啟示；以及第4支援畫面140，含有在散佈圖內距離近的第2項目彼此的類似度高的意思作為啟示。因此，利用者可使用各支援畫面中所含有的啟示，有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。The multiple support screens include: the first support screen 110, which contains the meaning that the first item near the origin in the scatter diagram does not have a distinctive feature as inspiration; the second support screen 120, which contains the scatter diagram located from the origin to the The first item in the direction of the departure of the second item gives the meaning to the second item as inspiration; the third support screen 130 contains the meaning that the first items that are close to each other in the scatter diagram have high similarity as Enlightenment; and the fourth support screen 140 contains the meaning that the similarities between the second items that are close in the scatter diagram are high as inspiration. Therefore, the user can use the enlightenment contained in each support screen to efficiently perform the process of guiding insights from the graph showing the results of the corresponding analysis.

在第1支援畫面110中所含有的散佈圖中，原點附近的範圍使用圓115來圖示。在第2支援畫面120中所含有的散佈圖中，從原點向所選擇的第2項目離去的方向的範圍使用半直線126、半直線127來圖示。在第3支援畫面130中所含有的散佈圖中，所選擇的第1項目附近的範圍使用圓135來圖示。在第4支援畫面140中所含有的散佈圖中，圖示有表示與所選擇的第2項目的距離最近的第2項目的符號（箭頭145）。因此，利用者可觀看各支援畫面中所圖示的範圍或符號，容易地知道不具有顯著的特徵的第1項目、對所選擇的第2項目賦予特徵的第1項目、與所選擇的第1項目的類似度高的第1項目、及與所選擇的第2項目的類似度高的第2項目。In the scatter diagram included in the first support screen 110, the range around the origin is illustrated with a circle 115. In the scatter diagram included in the second support screen 120, the range from the origin to the direction in which the selected second item departs is shown using half-line 126 and half-line 127. In the scatter diagram included in the third support screen 130, the range around the selected first item is illustrated with a circle 135. In the scatter diagram included in the fourth support screen 140, a symbol (arrow 145) indicating the second item closest to the selected second item is shown. Therefore, the user can view the range or symbol shown in each support screen to easily know the first item that does not have a distinctive feature, the first item that features the selected second item, and the selected first item The first item with a high similarity of item 1 and the second item with a high similarity to the selected second item.

在輸入分析結果的步驟中，輸入對將單詞作為第1項目，將文章的部分作為第2項目，將文章的各部分中的各單詞的出現頻率作為表內資料的複合表進行對應分析的結果作為分析結果。因此，利用者可有效率地進行從表示與單詞及文章的部分相關的對應分析的結果的散佈圖中引導出見解的處理。In the step of inputting the analysis result, enter the result of the corresponding analysis of the compound table that takes the word as the first item, the part of the article as the second item, and the appearance frequency of each word in each part of the article as the data in the table As an analysis result. Therefore, the user can efficiently perform a process of leading insights from a scatter diagram showing the results of correspondence analysis related to words and text parts.

本實施形態的文本挖掘支援裝置10、及本實施形態的文本挖掘支援程式31具有與本實施形態的文本挖掘支援方法相同的特徵，並取得相同的效果。The text mining support device 10 of this embodiment and the text mining support program 31 of this embodiment have the same features as the text mining support method of this embodiment, and achieve the same effects.

再者，在以上的說明中，文本挖掘支援裝置10顯示二維地表示對應分析的結果的散佈圖。並不限定於此，本發明也可以應用於顯示多維地表示對應分析的結果的圖表（例如三維圖表）的文本挖掘支援方法及裝置。另外，與顯示表示對於與文本資料相關的複合表的對應分析的結果的散佈圖的文本挖掘支援方法及裝置同樣地，可構成顯示表示對於與文本資料以外的任意的資料相關的複合表的對應分析的結果的圖表（散佈圖或三維圖表等）的資料挖掘支援方法及裝置。In addition, in the above description, the text mining support device 10 displays a scatter diagram that two-dimensionally shows the results of the correspondence analysis. The invention is not limited to this, and the present invention can also be applied to a text mining support method and device that displays a graph (for example, a three-dimensional graph) that shows the results of corresponding analysis in multiple dimensions. In addition, similar to the text mining support method and apparatus that displays a scatter diagram showing the results of the correspondence analysis on a composite table related to text data, it can be configured to display and display correspondence to a composite table related to arbitrary data other than text data Data mining support method and device for graphs of analysis results (scatter graphs or three-dimensional graphs, etc.).

根據本發明的文本挖掘支援方法及裝置，通過顯示包含表示對應分析的結果的圖表與表示圖表的看法的啟示的支援畫面，利用者可有效率地進行從表示對應分析的結果的圖表中引導出見解的處理。According to the text mining support method and apparatus of the present invention, by displaying a support screen including a graph showing the results of the correspondence analysis and a revelation showing the views of the graph, the user can efficiently perform guidance from the graph showing the results of the correspondence analysis Insight handling.

以上對本發明進行了詳細說明，但以上的說明在所有方面均為例示而非進行限制者。理解為可不脫離本發明的範圍而想出許多其他變更或變形。The present invention has been described in detail above, but the above description is illustrative and not restrictive in all respects. It is understood that many other changes or modifications can be conceived without departing from the scope of the present invention.

1‧‧‧文本資料2‧‧‧分析結果5‧‧‧文本分析裝置10‧‧‧文本挖掘支援裝置11‧‧‧分析結果輸入部12‧‧‧指示輸入部13‧‧‧畫面生成部14‧‧‧分析結果顯示部20‧‧‧電腦21‧‧‧CPU22‧‧‧主記憶體23‧‧‧記憶部24‧‧‧輸入部25‧‧‧顯示部26‧‧‧通訊部27‧‧‧記錄媒體讀取部28‧‧‧鍵盤29‧‧‧滑鼠30‧‧‧記錄媒體31‧‧‧文本挖掘支援程式100‧‧‧基本畫面101‧‧‧畫面選擇窗口102、112、122、132、142‧‧‧散佈圖窗口103‧‧‧單選按鈕110、120、130、140‧‧‧支援畫面113、123、133‧‧‧單詞清單窗口143‧‧‧變數清單窗口114、124、134、144‧‧‧啟示窗口115、135‧‧‧圓125、145‧‧‧箭頭126、127‧‧‧半直線S101～S112‧‧‧步驟1‧‧‧Text data 2‧‧‧Analysis result 5‧‧‧Text analysis device 10‧‧‧Text mining support device 11‧‧‧Analysis result input unit 12‧‧‧Instruction input unit 13‧‧‧Screen generation unit 14 ‧‧‧Analysis result display part 20‧‧‧computer 21‧‧‧ CPU 22‧‧‧ main memory 23‧‧‧ memory part 24‧‧‧ input part 25‧‧‧ display part 26‧‧‧ communication part 27‧‧ ‧Recording media reading section 28‧‧‧Keyboard 29‧‧‧Mouse 30‧‧‧Recording media 31‧‧‧Text mining support program 100‧‧‧Basic screen 101 Screen selection windows 102, 112, 122, 132, 142‧‧‧ scatter diagram window 103‧‧‧ radio buttons 110, 120, 130, 140‧‧‧ support screen 113, 123, 133‧‧‧ word list window 143‧‧‧ variable list window 114, 124, 134, 144‧‧‧ Inspiration window 115, 135‧‧‧ circle 125, 145‧‧‧ arrow 126, 127‧‧‧ semi-straight line S101～S112‧‧‧ steps

圖1是表示本發明的實施形態的文本挖掘支援裝置的構成的方塊圖。圖2是表示成為對應分析的對象的複合表的圖。圖3是表示由圖1中所示的文本挖掘支援裝置所製作的散佈圖的圖。圖4是表示作為圖1中所示的文本挖掘支援裝置發揮功能的電腦的構成的方塊圖。圖5是表示圖1中所示的文本挖掘支援裝置的動作的流程圖。圖6是表示圖1中所示的文本挖掘支援裝置的基本畫面的圖。圖7是表示圖1中所示的文本挖掘支援裝置的第1支援畫面的圖。圖8是表示圖1中所示的文本挖掘支援裝置的第2支援畫面的圖。圖9是表示圖1中所示的文本挖掘支援裝置的第3支援畫面的圖。圖10是表示圖1中所示的文本挖掘支援裝置的第4支援畫面的圖。FIG. 1 is a block diagram showing the structure of a text mining support device according to an embodiment of the present invention. FIG. 2 is a diagram showing a composite table that is an object of correspondence analysis. FIG. 3 is a diagram showing a scattergram created by the text mining support device shown in FIG. 1. 4 is a block diagram showing the configuration of a computer that functions as the text mining support device shown in FIG. 1. 5 is a flowchart showing the operation of the text mining support device shown in FIG. 1. 6 is a diagram showing a basic screen of the text mining support device shown in FIG. 1. 7 is a diagram showing a first support screen of the text mining support device shown in FIG. 1. 8 is a diagram showing a second support screen of the text mining support device shown in FIG. 1. 9 is a diagram showing a third support screen of the text mining support device shown in FIG. 1. FIG. 10 is a diagram showing a fourth support screen of the text mining support device shown in FIG. 1.

101‧‧‧畫面選擇窗口 101‧‧‧Screen selection window

112‧‧‧散佈圖窗口 112‧‧‧ Scatter diagram window

110‧‧‧支援畫面 110‧‧‧Support screen

113‧‧‧單詞清單窗口 113‧‧‧ Word List Window

114‧‧‧啟示窗口(hint window) 114‧‧‧hint window

115‧‧‧圓 115‧‧‧ Yuan

Claims

A text mining support method, which is a text mining support method that displays analysis results obtained by correspondence analysis, is characterized by including: a step of inputting the analysis result; a step of inputting an instruction from a user; The step of screen data of the screen of the result graph; and the step of displaying the screen based on the screen data; and in the step of inputting the analysis result, input the result of correlating the first item with the second item, that is, including The results of the first component and the second component of the first item and the first component and the second component of the second item are used as the analysis result. The step of generating screen data includes generating the graph and the indicator A plurality of support screens of the revelation of the view of the graph, and screen data corresponding to the screen selected by the instruction from the basic screens including the graph and not including the revelation are created in the first The component serves as the horizontal axis, and the scatter diagram obtained by plotting the first item and the second item in a plane using the second component as the vertical axis is used as the graph.

The text mining support method as described in item 1 of the patent application scope, wherein the plurality of support screens include a first support screen, and the first support screen contains the first item near the origin in the scatter diagram without significant Characteristic meaning Narration.

The text mining support method according to item 2 of the patent application scope, wherein the scatter diagram included in the first support screen shows a range around the origin.

The text mining support method as described in item 1 of the scope of the patent application, wherein the plurality of support screens include a second support screen that includes a departure from the origin to the second item in the scatter diagram The first item in the direction means that the second item is characterized as the revelation.

The text mining support method according to item 4 of the patent application scope, wherein the scatter diagram included in the second support screen shows the range of the direction of departure from the origin to the selected second item.

The text mining support method as described in item 1 of the patent application range, wherein the plurality of support screens include a third support screen, and the third support screen contains the first items that are close in distance in the scatter diagram and have a high similarity to each other As the revelation.

The text mining support method according to item 6 of the patent application scope, wherein the scatter diagram included in the third support screen shows a range around the selected first item.

The text mining support method as described in item 1 of the patent application range, wherein the plurality of support screens include a fourth support screen, and the fourth support screen contains the second items that are close in the scatter diagram and have a high similarity As the revelation.

The text mining support method according to item 8 of the patent application scope, wherein the scatter diagram included in the fourth support screen is shown with a symbol representing the second item closest to the selected second item .

The text mining support method as described in item 1 of the patent application range, wherein in the step of inputting the analysis result, the input pair uses words as the first item and the part of the article as the second item, the article The frequency of occurrence of each word in each part of is used as a result of corresponding analysis of a composite table of data in the table as the analysis result.

The text mining support method according to item 1 of the patent application scope, wherein the screen data of the support screen includes a record related to the knowledge about the view of the chart as the inspiration.

A text mining support device, which is a text mining support device displaying analysis results obtained by corresponding analysis, is characterized by comprising: an analysis result input part for inputting the analysis result; an instruction input part for input from a user Instructions; a screen generating unit that generates screen data including a screen representing the analysis results; and an analysis result display unit that displays a screen based on the screen data; and in the analysis result input unit, input the first The result of the correspondence between item 1 and item 2, that is, the result including the first component and the second component of the first item and the first component and the second component of the second item is the analysis result, The screen generating unit generates a plurality of support screens including the chart and a revelation indicating the view of the chart, and a basic screen including the chart and not including the revelation, selected in response to the instruction The screen data of the screen is created by plotting the first item and the second item in a plane with the first component as the horizontal axis and the second component as the vertical axis. The chart.

The text mining support device as described in item 12 of the patent application range, wherein in the analysis result input section, a pair is selected with the word as the first item and the part of the article as the second item, and the The appearance frequency of each word in each part is used as a result of corresponding analysis of a composite table of data in the table as the analysis result.

The text mining support device according to item 12 of the patent application range, wherein the screen data of the support screen includes a record related to the knowledge about the view of the chart as the inspiration.