US8700995B2 - Content conversion system and recording medium storing computer program - Google Patents
Content conversion system and recording medium storing computer program Download PDFInfo
- Publication number
- US8700995B2 US8700995B2 US12/598,503 US59850308A US8700995B2 US 8700995 B2 US8700995 B2 US 8700995B2 US 59850308 A US59850308 A US 59850308A US 8700995 B2 US8700995 B2 US 8700995B2
- Authority
- US
- United States
- Prior art keywords
- content
- primary
- content data
- layout
- division
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
Definitions
- the present invention relates to a content conversion system and a recording medium storing a computer program.
- Patent documents 1 and 2 Non-patent document 1, and the like.
- a division point in a web page is determined on the basis of a distance between content components in the data description in an HTML document for displaying the web page formed of content components displayed on the screen.
- a determination standard for determining a division point in a web page is created on the basis of a dispersion value of the distance values between content components in the data description in the HTML document.
- content components (c) and (d) are separated from each other on a web page displayed on the screen exemplified in FIG. 11 , and thus it is preferable to determine a division point of the web page to display the content components (c) and (d) on different screens after web page division.
- the division point of the web page is determined on the basis of the distance between the content components in the data description, the division point may be determined to display the content components (c) and (d) on the same screen even after the division. As a result, a display result which does not correspond to the display layout of the original web page before the division may be produced.
- An object of the present invention is to provide a content conversion system contributing to appropriate division according to a display layout of the original contents before division at the time of dividing the contents and providing the contents to a mobile terminal or the like, in a case where the contents such as web pages are formed of content components such as images, texts, and hyperlinks, and the display layout of the content components is designated using a tag description such as in HTML.
- Another object of the present invention is to provide a recording medium storing a computer program to realize the content conversion system of the present invention using a computer.
- a content conversion system divides content data in which a display layout of content components is described using tags, so as to display, on a terminal, contents formed of the content components displayed on a screen, and includes: a primary division unit that divides the content data on a basis of the display layout of the content components; and a secondary division unit that divides the content data on a basis of a distance between the content components in a data description, the secondary division unit divides primary divided content data divided by the primary division unit, and the primary division unit performs re-division of the primary divided content data according to number of divisions made by the secondary division unit.
- the primary division unit may calculate a density obtained by dividing a display area related to the primary divided content data by the number of divisions made by the secondary division unit, and may repeat the re-division until the density satisfies a predetermined condition.
- the primary division unit may calculate the display area related to the primary divided content data using a tag attribute value representing size of the content components.
- the content conversion system may further include: a layout related tag determining unit that determines whether or not the content data includes a layout related tag, and only dividing by the secondary division unit may be performed when the content data includes no layout related tag.
- the layout related tag determining unit may determine whether or not a description of the layout related tag in the content data is grammatically correct, and only dividing by the secondary division unit may be performed when the description of the layout related tag is grammatically incorrect.
- a recording medium stores a computer program for performing content conversion of dividing content data in which a display layout of content components is described using tags, so as to display, on a terminal, contents formed of the content components displayed on a screen
- the computer program includes: a primary division function that divides the content data on a basis of the display layout of the content components; and a secondary division function that divides the content data on a basis of a distance between the content components in a data description, the secondary division function divides primary divided content data divided by the primary division function, and the primary division function performs re-division of the primary divided content data according to number of divisions made by the secondary division function.
- the present invention it is possible to obtain an advantage of contributing to appropriate division according to the display layout of the original contents before division at the time of dividing the contents and provide the contents to a mobile terminal or the like, in a case where the contents such as web pages are formed of content components such as images, texts, and hyperlinks, and the display layout of the content components is designated using a tag description such as HTML.
- FIG. 1 is a block diagram illustrating a configuration of a content conversion server according to an embodiment of the present invention.
- FIG. 2 is a flowchart illustrating a flow of processes in the content conversion server in FIG. 1 .
- FIG. 3 is a flowchart illustrating a flow of a process in a layout related tag determining unit in FIG. 1 .
- FIG. 4 is a flowchart illustrating a flow of a primary division process in a primary division unit in FIG. 1 .
- FIG. 5 is a flowchart illustrating a flow of the primary division process in the primary division unit in FIG. 1 .
- FIG. 6 is a flowchart illustrating a sequence of a calculation method of a display area according to an embodiment of the present invention.
- FIG. 7 is an example of a web page display screen.
- FIG. 8 is an example of a configuration of an HTML document corresponding to the web page in FIG. 7 .
- FIG. 9 is a schematic web page display screen for explaining effects according to the present invention.
- FIG. 10 is a graph diagram explaining a distance between contents according to the embodiment of the present invention.
- FIG. 11 is an example of a web page display screen.
- FIG. 12 is an example of a configuration of an HTML document corresponding to the web page in FIG. 12 .
- Contents according to the present invention are formed of content components such as images, texts, and hyperlinks, and a display layout of the content components is designated using a tag description such as HTML.
- a web page will be described as an example of the contents according to the present invention.
- An HTML document will be described as an example of data designating the display layout of the content components displayed on the web page.
- FIG. 1 is a block diagram illustrating a configuration of a content conversion server 1 according to an embodiment of the present invention.
- the content conversion server 1 includes a content data acquiring unit 11 , a layout related tag determining unit 12 , a primary division unit 13 , a secondary division unit 14 , and a reconfiguration unit 15 .
- the content conversion server 1 is connected to a communication network such as the internet.
- the content conversion server 1 can access to a web server 2 provided on the communication network, and acquire, from the web server 2 , content data for displaying a web page provided by the web server 2 .
- the content data includes the content components displayed on the web page, and the HTML document designating the display layout of the content components.
- the content components for example, there are images, texts, hyperlinks, and the like.
- the content components may be incorporated in the HTML document or may be provided as a file separate from the HTML document.
- texts and hyperlinks are generally incorporated in the HTML document.
- images are generally provided as a file separate from the HTML document, and the HTML document includes information (e.g., a URL (Uniform Resource Locator)) representing the location of the image file.
- a URL Uniform Resource Locator
- the content conversion server 1 can transmit and receive data to and from a terminal 3 through the communication network.
- the terminal 3 may be a mobile terminal for wireless communication or a fixed terminal for wired communication.
- a mobile phone terminal that is a registered terminal of a mobile phone network may be used as the terminal 3 .
- the terminal 3 has a browser 31 for achieving browsing various web pages.
- the browser 31 acquires content data of the web page that is a browsing target by the terminal 3 through the communication network according to a web page browsing operation of a user of the terminal 3 , and displays the web page on a display device of the terminal 3 on the basis of the acquired content data.
- the content data acquiring unit 11 receives a web page acquisition request from the browser 31 operated by the terminal 3 , and acquires the content data from the web server 2 in response to the request.
- the layout related tag determining unit 12 analyzes a structure of the tag described in the HTML document in the content data, and creates a tree representing the hierarchical structure of the layout related tags. The tree is transmitted to the primary division unit 13 .
- the layout related tag is a tag usable for the display layout of the content components.
- As the layout related tags in HTML for example, there are table related tags such as ⁇ table>, ⁇ tr>, and ⁇ td>, and a layout block definition related tag such as ⁇ div>.
- the primary division unit 13 divides the HTML document on the basis of the density of a web page display area.
- the primary division unit 13 divides the HTML document into the former part and the latter part with transition of the structure of the layout related tag as a boundary.
- the primary division unit 13 determines a division point of the HTML document to suppress the density of the web page display area corresponding to the HTML document after division to be lower than a predetermined value. On this occasion, a division result of the secondary division unit 14 is considered.
- the primary division unit 13 performs general division of the HTML document based on the display layout of the content components.
- the secondary division unit 14 divides the HTML document on the basis of a distance between the content components in the data description in the HTML document.
- the reconfiguration unit 15 reconfigures complete HTML documents by performing such as addition of headers to the HTML documents divided by the primary division unit 13 and the secondary division unit 14 . Each of the reconfigured HTML documents corresponds to one web page.
- the reconfiguration unit 15 returns the reconfigured HTML document with the content components to the terminal 3 in order in response to the request from the browser 31 . Accordingly, the terminal 3 receives the HTML document and the content components transmitted from the content conversion server 1 , and can display the divided web pages.
- FIG. 2 is a flowchart illustrating a flow of a process in the content conversion server 1 shown in FIG. 1 .
- the content data acquiring unit 11 acquires content data from the web server 2 .
- the layout related tag determining unit 12 analyzes the structure of the tags described in the HTML document in the content data.
- the layout related tag determining unit 12 detects the layout related tags in the HTML document from a shallow hierarchy to a deep hierarchy in order, and creates a tree representing the hierarchical structure of the layout related tags. The tree is used in the primary division unit 13 .
- Step S 2 the layout related tag determining unit 12 determines whether or not there exists a layout related tag and there is no grammatical error in the layout related tag in the HTML document.
- the determination result is YES
- the process proceeds to Steps S 3 and S 4 .
- the determination result is NO
- the process proceeds to Step S 5 .
- Steps S 3 and S 4 in the HTML document that is a division target, a layout related tag exists and there is no grammatical error in the layout related tag, and thus division (primary division) of the HTML document based on layout related tags and the density of the web page display area and division (secondary division) of the HTML document based on the distance between content components in the data description are performed in combination.
- Step S 3 the primary division unit 13 performs the division (primary division) of the HTML document on the basis of the density of the web page display area.
- the HTML document is divided into the former part and the latter part with the structure of the layout related tag being changed as a boundary.
- Step S 4 HTML documents (primary divided HTML documents) after the primary division performed by the primary division unit 13 are set to be a division target, and the secondary division unit 14 further performs the division (secondary division) of the primary divided HTML documents based on the distance between contents component in the data description.
- the secondary division unit 14 transmits the number of divisions Nk for each primary divided HTML document to the primary division unit 13 .
- the primary division unit 13 When the primary division unit 13 receives the number of divisions Nk for each primary divided HTML document from the secondary division unit 14 , it performs the process of Step S 3 again. In the process of Step S 3 , the primary division unit 13 calculates a density of the web page display area for each primary divided HTML document on the basis of the number of divisions Nk. When the density is equal to or higher than a predetermined value, the primary division unit 13 further divides the primary divided HTML document. The divided HTML documents are considered to be new primary divided HTML documents, and the process proceeds to Step S 4 .
- Steps S 3 and S 4 are repeated until all the primary divided HTML documents satisfy the density condition of the web page display area, that is, until the densities of the web page display areas corresponding to all the primary divided HTML documents are lower than a predetermined value.
- the primary division unit 13 receives all the results of the primary division at that time from the secondary division unit 14 and transmits the HTML documents obtained by secondarily dividing the primary divided HTML documents to the reconfiguration unit 15 .
- Step S 5 since in the HTML document that is the division target, there is no layout related tag or there is a grammatical error in the layout related tag, the division of the HTML document based on the layout related tag as a boundary is not performed, and only the division of the HTML document based on the distance between the content components in the data description is performed. Accordingly, in Step S 5 , the secondary division unit 14 divides the HTML document on the basis of the distance between the content components in the data description of the HTML document that is the division target. The secondary division unit 14 transmits the divided HTML documents to the reconfiguration unit 15 .
- Step S 6 the reconfiguration unit 15 performs such as addition of headers to each of the HTML documents divided in Steps S 3 and S 4 or to each of the HTML documents divided in Step S 5 , and reconfigures them into complete HTML documents.
- the reconfigured HTML documents are transmitted with the content components to the terminal 3 .
- FIG. 3 is a flowchart illustrating a flow of the process in the layout related tag determining unit 12 shown in FIG. 1 .
- Step S 11 the HTML document (HTML file) in the content data acquired from the web server 2 by the content data acquiring unit 11 is acquired.
- Step S 12 the layout related tags in the HTML document are searched from a shallow hierarchy to a deep hierarchy in order.
- Step S 13 it is determined whether or not a layout related tag exists in the HTML document. As a result, when there is a layout related tag, the process proceeds to Step S 14 . Meanwhile, when there is no layout related tag, the process proceeds to Step S 17 .
- Step S 14 a tree in which the layout related tags detected from the shallow hierarchy to the deep hierarchy in order are positioned at the detected hierarchies is created.
- the tree representing the hierarchical structure of the layout related tags is used in the primary division unit 13 .
- Step S 15 it is determined whether or not the description of the layout related tag is grammatically correct. In a method of determining whether the grammar is correct, when the description conforms to all the following regulations, it is determined that the description is correct.
- Step S 16 When the description of the layout related tag is grammatically correct, the process proceeds to Step S 16 . On the other hand, when the description of the layout related tag is not grammatically correct, the process proceeds to Step S 17 .
- Step S 16 since a layout related tag exists, and there is no grammatical error in the layout related tag, it is determined that the HTML documents can be divided using the layout related tag as a boundary.
- the division of the HTML document based on the density of the web page display area sectioned by the layout related tag and the division of the HTML document based on the distance between the content components in the data description are performed in combination.
- Step S 17 since there is either no layout related tag, or a grammatical error in the layout related tag, it is determined that the division of the HTML document using the layout related tag as a boundary is impossible. Thus, the division of the HTML document using the layout related tag as a boundary is not performed, and only the division of the HTML document based on the distance between the content components in the data description is performed.
- FIG. 4 and FIG. 5 are flowcharts illustrating flows of the primary division process in the primary division unit 13 shown in FIG. 1 .
- a variable i is initialized to 1.
- the variable i represents a depth of a hierarchy on the tree representing the hierarchical structure of the layout related tag concerning the HTML document that is the division target.
- the initial value of the variable i is “1”
- the value represents the shallowest first hierarchy.
- the variable i is “2”
- the value represents the next shallowest second hierarchy. That is, the variable i represents the i-th hierarchy in order of shallowness.
- Step S 22 in the HTML document that is the division target, the layout related tag in the i-th hierarchy is searched and extracted. In this case, when the layout related tag is detected, the process proceeds to Step S 24 . Meanwhile, when the layout related tag is not detected, the process proceeds to Step S 29 shown in FIG. 5 .
- Step S 24 the HTML document is divided using the layout related tag detected in Step S 22 as a boundary.
- the number of divisions is set to be Mi. Accordingly, Mi HTML documents after the primary division (primary divided HTML documents) are created.
- Step S 25 the Mi primary divided HTML documents are transmitted to the secondary division unit 14 .
- a variable j is initialized to 1.
- the variable j represents a number of the primary divided HTML document in the i-th hierarchy.
- the initial value “1” of the variable j represents the first primary divided HTML document in the i-th hierarchy.
- the variable j is “2”, the value represents the second primary divided HTML document in the i-th hierarchy. That is, the variable j represents the j-th primary HTML document in the i-th hierarchy.
- the variable j is a value from 1 to Mi.
- Step S 28 a density Dj concerning the j-th primary divided HTML document in the i-th hierarchy is calculated, and it is determined whether or not the density Dj is less than a predetermined value Db by comparing the density Dj with the predetermined value Db.
- Density Dj Display Area Sj /Number of Divisions Nk
- the density Dj is an index representing whether or not the j-th primary divided HTML document in the i-th hierarchy can be appropriately divided by the secondary division of the secondary division unit 14 .
- the density Dj is high, that is, when the number of secondary divisions is small as compared to when the display area is large, it shows that the primary division is insufficient.
- the primary division is further performed at the layout related tag of the deeper hierarchy by one stage as a boundary, to achieve the optimal division of combining the primary division with the secondary division.
- Step S 29 When the density Dj concerning the j-th primary division HTML document in the i-th hierarchy is lower than the predetermined value Db, the process proceeds to Step S 29 . Meanwhile, when the density Dj is equal to or higher than the predetermined value Db, the process proceeds to Step S 30 .
- Step S 29 since the density Dj concerning the j-th primary divided HTML document in the i-th hierarchy is lower than the predetermined value Db, 1 is added to the variable j to examine whether or not the density Dj concerning the next (j+1)-th primary divided HTML document in the i-th hierarchy is lower than the predetermined value Db.
- Step S 31 it is determined whether or not the examination of the density Dj for all (Mi) the primary divided HTML documents in the i-th hierarchy is completed by comparing the variable j with “Mi”.
- the process proceeds to Step S 32 . Meanwhile, when there are any primary divided HTML documents which have not been examined yet, the process returns to Step S 28 .
- Step S 32 the variable i is compared with “1”. In the case of a hierarchy deeper than the first hierarchy, the process proceeds to Step S 33 . Meanwhile, in the case of the first hierarchy, the process is ended.
- Step S 33 1 is subtracted from the variable i to make the hierarchy shallower by one stage, and the process returns to Step S 29 .
- Step S 30 the density Dj concerning the j-th primary divided HTML document in the i-th hierarchy is equal to or higher than the predetermined value Db. Accordingly, 1 is added to the variable i to make the hierarchy deeper by one stage, and the process returns to Step S 22 shown in FIG. 4 .
- the primary divisions of the HTML document are repeated by the processes shown in FIG. 4 and FIG. 5 , until the density Dj concerning all the primary divided HTML documents is lower than the predetermined value Db.
- FIG. 6 is a flowchart illustrating a sequence of the method of calculating the display area Sj according to the present embodiment.
- a pointer i is initialized to 1.
- the pointer i represents a number of the layout related tag of the primary divided HTML document that is the calculation target of the display area Sj.
- the initial value “1” of the pointer i represents the first layout related tag of the primary divided HTML document.
- the pointer i is “2”
- the value represents the second layout related tag of the primary divided HTML document. That is, the pointer i represents the i-th layout related tag of the primary divided HTML document.
- the pointer i shown in FIG. 6 is different from the variable i shown in FIG. 4 and FIG. 5 .
- Step S 42 the primary divided HTML document that is the display area calculation target is searched, and the i-th layout related tag is detected.
- Step S 43 when the detected i-th layout related tag is “ ⁇ table>”, the process proceeds to Step S 44 . Meanwhile, when the detected i-th layout related tag is not “ ⁇ table>”, the process proceeds to Step S 45 .
- Step S 44 it is determined whether or not a “height attribute” and “width attribute” are added to the ⁇ table> tag. When there are both “height attribute” and “width attribute”, the process proceeds to Step S 46 . Meanwhile, when there is no “height attribute” or “width attribute”, the process proceeds to Step S 45 .
- Step S 46 a partial display area Si concerning a table is calculated from the “height attribute” and “width attribute” added to the ⁇ table> tag.
- the “height attribute” is the number of pixels corresponding to the height of the table.
- the “width attribute” is the number of pixels corresponding to the width of the table.
- Step S 47 the pointer i is moved to the ⁇ /table> tag. Then, the process proceeds to Step S 48 .
- Step S 43 when the i-th layout related tag is not “ ⁇ table>”, the partial display area Si concerning texts and images corresponding to the i-th layout related tag is calculated in Step S 45 .
- Step S 45 regarding a hyperlink described in a text type in the HTML document, its partial display area Si is also calculated as a text.
- the partial display area Si concerning texts is calculated using a “size attribute” added to a ⁇ font> tag modifying texts (including hyperlinks).
- the “size attribute” is the number of pixels corresponding to the font size.
- the font size is acquired from a style sheet.
- Step S 48 it is determined whether or not a layout related tag still exists.
- a layout related tag exists, 1 is added to the pointer i in Step S 49 and the process returns to Step S 42 . Meanwhile, when the process is completed for all the layout related tags of the primary divided HTML document, the process proceeds to Step S 50 .
- Step S 50 the partial display areas Si concerning tables, texts, and images are summed, and the sum value is considered as the display area Sj concerning the primary divided HTML document.
- the primary division of the HTML document based on the density of the web page display area with using the layout related tag as a boundary is combined with the secondary division of the HTML document based on the distance between the content components in the data description. Further, the densities of the web page display areas corresponding to all the primary divided HTML documents are made lower than the predetermined value.
- FIG. 8 shows an HTML document H 40 corresponding to the web page shown in FIG. 7 .
- the content component (ijkl) is laid out using a ⁇ tr> tag of the third hierarchy belonging to a ⁇ table> tag of the first hierarchy.
- the content components (abcde) and (xyz) are laid out using two ⁇ td> tags of the sixth hierarchy belonging to a ⁇ table> tag of the fourth hierarchy.
- the content component (pqrs) is laid out using a ⁇ td> tag of the sixth hierarchy belonging to the other ⁇ table> tag of the fourth hierarchy.
- the HTML document H 40 is primarily divided into two primary divided HTML documents using the ⁇ table> tag of the fourth hierarchy as a boundary.
- Each of the primary divided HTML documents is subject to a secondarily division on the basis of the distance between the content components in the data description, and the primary division is additionally performed on the primary divided HTML documents according to the densities of the primary divided HTML documents calculated using the number of divisions made by the secondary division.
- the densities of the web page display areas corresponding to all the primary divided HTML documents are limited to be lower than the predetermined value by the primary division and the secondary division. Furthermore, the division based on the distance between the content components in the data description is performed on the primary divided HTML documents. Accordingly, it is possible to appropriately adjust the display density of the content components per page in the web page after the division, and it is easy to view each of the web pages after the division when they are displayed on the terminal. The reason is because the division according to the display layout of the original web page before the division is realized.
- a web page display screen 32 shown in FIG. 9 is divided into areas represented by broken lines shown in FIG. 9 .
- the display densities of the content components per page are different from one another in the divided web page, it is not easy to view the web page, and it does not correspond to the layout of the original web page before the division.
- a screen is relatively small such as a mobile phone
- the web page is divided into areas represented by solid lines shown in FIG.
- Patent document 2 may be applied to the secondary division unit 14 .
- the secondary division unit 14 determines a division point of an HTML document on the basis of a distance between content components of the HTML document in the HTML description.
- the distance between the content components is obtained by integrating the nest depths of all tags described between two content components in the HTML document.
- the nest depth of the tags represents a division degree of the display layout in the web page.
- the distance between the content components closely related to each other becomes short on the display layout of the web page. Meanwhile, the distance between the content components less related to each other becomes long. Particularly, in web pages realizing a complicated layout using such as table tags at multi-stages, such a tendency is high.
- the division point in the HTML document is determined considering that the longer the distance between the content components is, the lesser the content components are related to each other.
- FIG. 10 is a graph diagram explaining the distance between the content components.
- the horizontal axis represents a tag sequence (x), and the vertical axis represents a nest depth (y) of tags.
- the distance S (a, b) between the content components 101 and 102 is calculated.
- the distance S (a, b) between the content components is calculated by the equation (1).
- x a is the tag sequence of the content component 101
- y a is the nest depth of the content component 101
- x b is the tag sequence of the content component 102
- y b is the nest depth of the content component 102
- f(x) is a function of providing the nest depth (y) of a tag corresponding to the tag sequence (x).
- the secondary division unit 14 calculates the distances between all the content components in the HTML document.
- the secondary division unit 14 determines the division point of the HTML document by comparing the largeness and smallness of the calculated distances between the content components.
- the secondary division unit 14 uses division parameters (threshold values N 1 and N 2 , N 1 >N 2 ) for the determination standard of the distance between the content components.
- the division parameters (threshold values N 1 and N 2 ) are the determination standard of the distance between the content components to determine the division point of the HTML document.
- the sequence (Steps S 111 to S 115 ) of determining the division point of the HTML document is described below.
- Step S 112 when the maximum value (Smax) of the distance between the content components in the content object is equal to or more than N 1 times the average value (Saverage) of the distance between the content components in the content object, the location between the content components corresponding to the maximum value (Smax) is determined as the division point.
- Step S 113 in the case where the determination by the threshold value N 1 in Step S 112 is not YES, when the maximum value (Smax) is equal to or more then N 2 times the average value (Saverage) and the number of the content components in one content object after the division is equal to or more than the threshold value M, the location between the content components corresponding to the maximum value (Smax) is determined as the division point.
- Step S 115 when a new division point of the content object is not newly found in Step S 112 or Step S 113 , the process is ended.
- the secondary division unit 14 performs the above division point determining processes (Steps S 111 to S 115 ) on the primary divided HTML document received from the primary division unit 13 as a target, and determines the division point of the primary divided HTML document.
- the secondary division unit 14 divides the primary divided HTML document according to the division point.
- the secondary division unit 14 transmits the number of divisions Nk concerning the primary divided HTML document to the primary division unit 13 .
- the division parameters may be predetermined fixed values, and the appropriate threshold values N 1 and N 2 may be actively calculated for each web page.
- a method of actively calculating the division parameters (threshold values N 1 and N 2 ) is described in Non-patent document 1.
- the content conversion server 1 may be realized as an individual device as shown in FIG. 1 , or may be provided in the web server 2 or the terminal 3 .
- the content conversion server 1 may be realized by dedicated hardware, or may be configured by a general-purpose computer system such as a personal computer to realize the functions by executing a program for realizing the functions of the content conversion server 1 shown in FIG. 1 .
- the content conversion process may be performed by recording the program for realizing the functions of the content conversion server 1 shown in FIG. 1 in a computer-readable recording medium, and causing the computer system to read and execute the program recorded in the recording medium.
- the “computer system” may include an OS or hardware such as peripheral devices.
- the “computer system” also includes a home page providing environment (or a display environment) when using a WWW system.
- the “computer-readable recording medium” includes writable non-volatile memories such as a flexible disk, a magneto-optical disk, a ROM, a flash memory, portable media such as a CD-ROM, and storage devices such as hardware in a computer system.
- the “computer-readable recording medium” includes those that hold a program for a fixed time, such as a volatile memory (e.g. a dynamic random access memory ⁇ DRAM ⁇ etc.) in computer systems that become servers and clients when a program is transmitted via a communication line such as a telephone line or a network such as the Internet.
- a volatile memory e.g. a dynamic random access memory ⁇ DRAM ⁇ etc.
- a communication line such as a telephone line or a network such as the Internet.
- the program may be transmitted from a computer system where it was stored in a storage device to another computer system via a transmission medium, or by transmission waves in a transmission medium.
- a ‘transmission medium’ for transmitting the program denotes a medium having a function of transmitting information such as a network (communication network) such as the Internet and a communication wire (communication line) such as a telephone line.
- the present invention can be applied to the content conversion system.
- the content conversion system when the contents such as web pages are formed of the content components such as images, texts, and hyperlinks, and the display layout of the content components is designated using the tag description such as HTML, it is possible to contribute to the appropriate division according to the display layout of the original contents before the division at the time of providing the divided contents to the mobile terminal or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Document Processing Apparatus (AREA)
- User Interface Of Digital Computer (AREA)
- Information Transfer Between Computers (AREA)
- Digital Computer Display Output (AREA)
Abstract
Description
- [Patent document 1] Japanese Unexamined Patent Application, First Publication No. 2001-229106
- [Patent document 2] Japanese Unexamined Patent Application, First Publication No. 2006-155147
- [Non-patent document 1] Gen HATTORI, Kazunori MATSUMOTO, Fumiaki SUGAYA, “Dynamic Segmentation of a Web Page Based on Content-Distance Distribution”, Information Processing Society Paper (transaction) database (TOD), Vol. 47 No. SIG8, June, 2006
- 1 Content conversion server (content conversion system)
- 11 Content data acquiring unit
- 12 Layout related tag determining unit
- 13 Primary division unit
- 14 Secondary division unit
- 15 Reconfiguration unit
Density Dj=Display Area Sj/Number of Divisions Nk
Partial Display Area concerning Table Si=“Number of Pixels of Height Attribute”דNumber of Pixels of Width Attribute”
Partial Display Area Si concerning Text=“Number of Pixels of Size Attribute”×Number of Text Letters
Partial Display Area Si concerning Image=“Number of Pixels of Height Attribute”דNumber of Pixels of Width Attribute”
Claims (3)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2007-122527 | 2007-05-07 | ||
JP2007122527A JP4919870B2 (en) | 2007-05-07 | 2007-05-07 | Content conversion system and computer program |
PCT/JP2008/058435 WO2008136514A1 (en) | 2007-05-07 | 2008-05-02 | Content conversion system and recording medium recorded with computer program |
Publications (2)
Publication Number | Publication Date |
---|---|
US20100138738A1 US20100138738A1 (en) | 2010-06-03 |
US8700995B2 true US8700995B2 (en) | 2014-04-15 |
Family
ID=39943617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/598,503 Expired - Fee Related US8700995B2 (en) | 2007-05-07 | 2008-05-02 | Content conversion system and recording medium storing computer program |
Country Status (4)
Country | Link |
---|---|
US (1) | US8700995B2 (en) |
JP (1) | JP4919870B2 (en) |
KR (1) | KR20090130418A (en) |
WO (1) | WO2008136514A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101340885B1 (en) * | 2011-02-25 | 2013-12-13 | 숭실대학교산학협력단 | Information content managing server, apparatus and method for providing information content |
KR101873917B1 (en) * | 2011-11-17 | 2018-07-04 | 삼성전자 주식회사 | Display apparatus and control method thereof |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001229106A (en) | 2000-02-18 | 2001-08-24 | Hitachi Ltd | Contents conversion system |
US20040049737A1 (en) * | 2000-04-26 | 2004-03-11 | Novarra, Inc. | System and method for displaying information content with selective horizontal scrolling |
KR20050040030A (en) | 2003-10-27 | 2005-05-03 | 한국전자통신연구원 | Method and apparatus for displaying web page in terminal |
JP2006155147A (en) | 2004-11-29 | 2006-06-15 | Kddi Corp | Content conversion system and computer program |
US20060149775A1 (en) * | 2004-12-30 | 2006-07-06 | Daniel Egnor | Document segmentation based on visual gaps |
US7225397B2 (en) * | 2001-02-09 | 2007-05-29 | International Business Machines Corporation | Display annotation and layout processing |
US7246306B2 (en) * | 2002-06-21 | 2007-07-17 | Microsoft Corporation | Web information presentation structure for web page authoring |
-
2007
- 2007-05-07 JP JP2007122527A patent/JP4919870B2/en not_active Expired - Fee Related
-
2008
- 2008-05-02 US US12/598,503 patent/US8700995B2/en not_active Expired - Fee Related
- 2008-05-02 KR KR1020097024688A patent/KR20090130418A/en not_active Application Discontinuation
- 2008-05-02 WO PCT/JP2008/058435 patent/WO2008136514A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001229106A (en) | 2000-02-18 | 2001-08-24 | Hitachi Ltd | Contents conversion system |
US20040049737A1 (en) * | 2000-04-26 | 2004-03-11 | Novarra, Inc. | System and method for displaying information content with selective horizontal scrolling |
US7225397B2 (en) * | 2001-02-09 | 2007-05-29 | International Business Machines Corporation | Display annotation and layout processing |
US7246306B2 (en) * | 2002-06-21 | 2007-07-17 | Microsoft Corporation | Web information presentation structure for web page authoring |
KR20050040030A (en) | 2003-10-27 | 2005-05-03 | 한국전자통신연구원 | Method and apparatus for displaying web page in terminal |
JP2006155147A (en) | 2004-11-29 | 2006-06-15 | Kddi Corp | Content conversion system and computer program |
US20060149775A1 (en) * | 2004-12-30 | 2006-07-06 | Daniel Egnor | Document segmentation based on visual gaps |
Non-Patent Citations (2)
Title |
---|
Gen Hattori, et al., Fumiaki Sugaya, "Dynamic Segmentation of a Web page Based on Content-Distance Distribution", Information Processing Society Paper (transaction) database (TOD), vol. 47 No. SIG8, Jun. 2006. |
Yuki Arase et al., "Keitai Denwa o Mochiita Web Etsuran no Tameno Contents Tekioteki Teiji System" Transactions of Information Processing Society of Japan, Dec. 15, 2006, vol. 47, No. 12, pp. 3149 to 3164. |
Also Published As
Publication number | Publication date |
---|---|
JP2008276694A (en) | 2008-11-13 |
US20100138738A1 (en) | 2010-06-03 |
WO2008136514A1 (en) | 2008-11-13 |
JP4919870B2 (en) | 2012-04-18 |
KR20090130418A (en) | 2009-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10592737B2 (en) | Mathematical formula learner support system | |
US9026526B1 (en) | Providing images of named resources in response to a search query | |
US7203901B2 (en) | Small form factor web browsing | |
US9390077B2 (en) | Document division method and system | |
US8898296B2 (en) | Detection of boilerplate content | |
US20110173188A1 (en) | System and method for mobile document preview | |
US9582486B2 (en) | Apparatus and method for classifying and analyzing documents including text | |
US8051371B2 (en) | Document analysis system and document adaptation system | |
CA2918840C (en) | Presenting fixed format documents in reflowed format | |
EP1624383A2 (en) | Adaptive system and process for client/server based document layout | |
JPH10228473A (en) | Document picture processing method, document picture processor and storage medium | |
US20150161094A1 (en) | Apparatus and method for automatically generating visual annotation based on visual language | |
CN103399885A (en) | Mining method and device of POI (point of interest) representing images and server | |
CN110209780B (en) | Question template generation method and device, server and storage medium | |
KR100463835B1 (en) | Index extraction method of web contents transcoding system for small display devices | |
US8700995B2 (en) | Content conversion system and recording medium storing computer program | |
CN114625996A (en) | Webpage content paging method and device, electronic equipment and readable storage medium | |
US20100083093A1 (en) | Content Conversion System and Computer Program | |
US20120084637A1 (en) | Image processing apparatus, image processing method, and storage medium storing image processing program | |
CN106557537B (en) | Webpage picture label display method and device | |
KR20160109302A (en) | Knowledge Based Service System, Sever for Providing Knowledge Based Service, Method for Knowledge Based Service, and Computer Readable Recording Medium | |
KR100900488B1 (en) | Systems and method of printing web-pages | |
US7395266B2 (en) | Portable terminal and method of controlling the same | |
JP4624086B2 (en) | Content conversion system and computer program | |
US20130104014A1 (en) | Viewer unit, server unit, display control method, digital comic editing method and non-transitory computer-readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: KDDI CORPORATION,JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HATTORI, GEN;SUGAYA, FUMIAKI;REEL/FRAME:023455/0959 Effective date: 20091026 Owner name: KDDI CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HATTORI, GEN;SUGAYA, FUMIAKI;REEL/FRAME:023455/0959 Effective date: 20091026 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.) |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.) |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20180415 |