WO1999023584A2 - Information component management system - Google Patents
Information component management system Download PDFInfo
- Publication number
- WO1999023584A2 WO1999023584A2 PCT/US1998/023193 US9823193W WO9923584A2 WO 1999023584 A2 WO1999023584 A2 WO 1999023584A2 US 9823193 W US9823193 W US 9823193W WO 9923584 A2 WO9923584 A2 WO 9923584A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- information component
- document
- xml
- component
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
Definitions
- the present invention relates to an information component management system. Specifically, the system of the present invention enables documents, images and other types of information to be packaged within an active info ⁇ nation component object, which can then be stored, ret ⁇ eved and manipulated according to content rather than according to form.
- Documents can be defined as a collection of ideas and information, which are organized within a certain structure
- the ideas and information may be logically linked according to va ⁇ ous relationships, but as a whole should follow a common theme
- the collection itself is expressed as a combination of text and graphic items.
- ideas can be expressed with words or graphics
- Data can be in the form of numbers, symbols, graphics or even sounds
- the final element, structure is an important element of a document, yet it is often overlooked as a separate entity
- the structure of a document is the way in which the data and ideas are organized within the document, thereby providing additional significance to these data and ideas.
- the first category is a document management system
- This system was originally designed to enable searches for information according to specific keywords within defined database fields.
- this underlying system design has many disadvantages. For example, the types of searches are limited by the structure of the database itself.
- information must be extracted from the document and entered mto the database manually, which is time consuming, expensive and prone to human error
- structured management systems have significant drawbacks for document management.
- non-structured text ret ⁇ eval systems solves certain problems but also creates new difficulties. These systems enable automatic indexing of information, without the need for human intervention. However, in non-structured ret ⁇ eval systems, only the free text of the document is automatically indexed. Therefore, only free text from the document can be searched. Although free text is an important component of a document, such a system loses the other types of available information. Furthermore, the context of ideas or concepts within a document is largely lost by the automatic indexing procedure, leaving the user with a collection of disconnected textual segments or documents which are divorced from the general theme expressed by the entire document. Thus, the user must often read an entire document or a collection of search results in order to find the desired information
- the information component management system of the present invention enables documents, images and other types of mformation to be packaged withm an active information component object, which can then be stored, ret ⁇ eved and manipulated according to content rather than according to form
- the information component includes concepts or ideas, data and structure as separate but related entities.
- an information component system for sto ⁇ ng an o ⁇ ginal document comp ⁇ sing: (a) at least one information component for sto ⁇ ng information from the o ⁇ ginal document, the at least one information component featu ⁇ ng at least one information component p ⁇ mitive, (b) an information component identifier for classifying the at least one information component according to at least one information component class; and (c) at least one property of the at least one mformation component.
- a system for displaying a native file format document the document including text and having a native file format and a native document appearance, the native file format including at least one instruction for displaying the text of the native file format document
- the system comp ⁇ sing: (a) a Web browser for displaying the native file format document according to the native document appearance; and (b) a HT.ML rende ⁇ ng engine for obtaining information regarding the native document appearance of the native file format document, for translating the information into a raster file having a raster format displayable by the Web browser, and for giving the translated information to the Web browser, such that the Web browser is able to display the native file format document.
- a method for managing information comp ⁇ sing the steps of: (a) captu ⁇ ng the information in an electronic format; (b) converting the captured information into an information component, the information component featu ⁇ ng: (I) a pointer to a storage location of the captured information; (n) at least one method for manipulating the captured information; and (in) at least one property of the captured information; (c) sto ⁇ ng the information component; and (d) displaying the information component such that the captured information appears in substantially the o ⁇ ginal format.
- an information component comp ⁇ sing a software object including: (a) a pointer to a storage location of the stored o ⁇ ginal information ;(b) at least one method for manipulating the stored o ⁇ gmal information; and (c) at least one property of the stored o ⁇ ginal information.
- a server for serving stored information to a client Web browser, the server comp ⁇ sing: (a) a database for sto ⁇ ng the stored information; and (b) an image processor for accessing the stored information from the database and transforming the stored information into a Searchable Image Format (SIF) file, the SIF file being accessed by the client Web browser, such that the stored information is displayed by the client Web browser.
- SIF Searchable Image Format
- computing platform refers to a particular computer hardware system or to a particular software operating system.
- hardware systems include, but are not limited to, personal computers (PC), MackintoshTM computers, mainframes, minicomputers and workstations.
- software operating systems include, but are not limited to, UNIX.
- the term "software object” includes any software application capable of substantially independent execution by an operating system
- a software application whether a software object or substantially any other type of software application, could be w ⁇ tten in substantially suitable programming language, which could easily be selected by one of ordinary sloll in the art
- suitable programming languages include, but are not limited to, C, C++ and Java
- Web browser refers to any software program which can be used to view a document w ⁇ tten at least partially with at least one instruction taken from HTML (HyperText Mark-up Language) or VRML (Virtual Reality Modeling Language), or any other equivalent computer document language, hereinafter collectively and generally referred to as "document mark-up language"
- Web browsers include, but are not limited to, MosaicTM, Netscape NavigatorTM and MicrosoftTM Internet ExplorerTM
- raster format refers to any image format supported by Web browsers including, but not limited to, GIF (Graphics Interchange Format), JPEG (Joint Photographies Expert Group) and PNG (Portable Network Graphics)
- FIGS 1A-1C are schematic block diagrams of va ⁇ ous exemplary information components and classes
- FIGS. 2A and 2B are schematic block diagrams of the general architecture of an exemplary system of the present invention
- FIGS. 3 A and 3B are schematic block diagrams of a preferred embodiment of the IC
- FIG 4 is a schematic block diagram of an exemplary IC component generator of the present invention
- FIG. 5 is a schematic block diagram of a preferred embodiment of the IC Server of the present invention
- HG 6 IS a schematic block diagram of a preferred embodiment of the IC Client of the present invention
- FIG. 7 is a schematic block diagram of a preferred embodiment of the system of the present invention as implemented in X.ML;
- HG. 8 is a schematic block diagram of dynamic link management according to the present invention.
- FIG. 9 is a schematic block diagram of DTD normalization according to the present invention.
- FIG. 10 is a schematic block diagram of an exemplary system for .HT.ML rende ⁇ ng according to the present invention.
- FIG 1 1 shows an exemplary output of the system of Figure 10
- FIG. 12 shows an exemplary embodiment of the IC Server of the present invention as implemented with Java Bean objects
- FIG. 13 shows an exemplary embodiment of the IC Client of the present invention as implemented with Java Bean objects.
- the information component management system of the present invention enables documents, images and other types of structured or non-structured information to be analyzed
- the underlying structure of the information is determined, and the structure is exposed to a database.
- the information is then packaged withm an active information component object, which can then be stored, ret ⁇ eved and manipulated according to content rather than according to form.
- the information component includes concepts or ideas, data and structure as separate but related entities.
- Information components are linked to each other according to a particular relationship, which may be either parallel or hierarchical.
- an image of a face of a person is an information component which may in turn be a portion of a larger object, such as a group photo, which may in turn be a portion of an article.
- the image of the face, the group photo and the article are all individual information components which are linked according to a hierarchical structure.
- Each information component mhe ⁇ ts the features of all associated information components which are higher in the hierarchical structure, and in turn cont ⁇ butes to the pool of features characte ⁇ zing associated information components which are lower in the hierarchical structure
- information components have both content related to the actual stored information, and content related to the features of associated, higher level components.
- the actual stored information from an information componeni is displayed in substantially the same format as the o ⁇ ginal source format, so as to maintain the o ⁇ ginal appearance as much as possible
- the displayed information maintains substantially the same fonts, graphics and structure, so that a newspaper page is displayed as a substantially exact reproduction of the page as it o ⁇ ginally appeared in newsp ⁇ nt, for example
- the system of the present invention has a clear advantage over p ⁇ or art document management systems, which usually display ret ⁇ eved information only as text. Even if graphic images are also displayed, the structure of the entire document, and the visual relationship between the text and the images, is not maintained by these p ⁇ or art systems.
- the information component management system of the present invention is able to search for. and ret ⁇ eve, information based upon all characte ⁇ stics of the information component, including graphic images, text and structural relationships. Results are presented as intuitive, visually explicit objects which are easy to examine, manipulate and navigate through. Furthermore, the search results are presented according to the ranked relevance to the desired search strategy, in which the rank is determined with both the full content and the complete characte ⁇ stics of the information component.
- the system of the present invention includes two basic p ⁇ nciples * object o ⁇ ented management and visual information ret ⁇ eval Both p ⁇ nciples will be explained in greater detail below, in the Descnption of the Preferred Embodiments B ⁇ efly, the information components are managed as objects which belong to an information class. Different information classes are linked according to the logical relationship between the components in each class. Overall, the classes are placed withm a hierarchical structure, in which each child class inhe ⁇ ts the properties of the parent class Each information class defines the properties and operations of a set of information component.
- each information component is a representation of information, combining structured and non-structured data.
- the information component also features methods for accessing and manipulating the information, including the data interface and any data operations. Because the methods of the information component are exposed to the general computational environment, the component either can be displayed, or can display itself, on any type of computing platform or operating system. Thus, the information component is both compatible across different computing platforms and has an open, easily accessible interface
- the information component In order to prepare such an information component, several procedures must be performed. First, the information must be identified. Next, the information must be classified and the actual information component must be created The relationship between the new information component and other information component(s) must be identified. Finally, the behavior of the completed information component is determined according to the functionality of the att ⁇ butes or features which accrue to that component after classification and identification of relationships
- the information component can be searched and ret ⁇ eved through visual information search and ret ⁇ eval Bnefly, the search can be performed according to keyword, visual example and graphic att ⁇ butes Visual examples include images or graphic objects which are compared to graphic information stored in the database, just as a keyword search involves the compa ⁇ son of keywords to text stored in the database.
- Graphic att ⁇ butes include font size, font att ⁇ bute and relative positioning of information withm a document. These att ⁇ butes can also be used as search parameters Thus, the search is not limited to a simple keyword compa ⁇ son of stored textual information.
- the system of the present invention includes a mechanism for learning the preferences and profile of an individual user, which can then also be used to calculate the relevance ranking of the ret ⁇ eved information
- the present invention is of an information component management system, in which information is packaged as an information component, including textual data, images and structure Information components are related to each other according to a hierarchical organization, in which characte ⁇ stics of components which are higher in the hierarchy accrue to those components which are lower m the hierarchy
- Information components can be searched and ret ⁇ eved according to all att ⁇ butes of the actual information, as well as the characte ⁇ stics of the component and relationships between components.
- the information component management system of the present invention is not limited to simple storage, searches and ret ⁇ eval of textual data only, but instead preserves all aspects of the o ⁇ ginal source of information.
- the detailed desc ⁇ ption of the system of the present invention will be divided into four chapters.
- the first chapter desc ⁇ bes va ⁇ ous background art technologies which are the preferred support technologies for the system of the present invention. These technologies are desc ⁇ bed as "background art” because they are not fulfilling the same functions as the system of the present invention, but instead are merely enabling these functions. These technologies are given as examples only and are not intended to be limiting in any way.
- the second chapter provides a b ⁇ ef overall view of the entire system according to the present invention
- the third chapter desc ⁇ bes an exemplary implementation of the present invention with objects in an XML environment
- the fourth chapter desc ⁇ bes an exemplary implementation of the present invention with Java Bean objects in a CORBA environment.
- the background art technologies which desc ⁇ bed in this chapter are well known m the art.
- the desc ⁇ ption provided herein is not intended to be exhaustive, but rather to desc ⁇ be those aspects of the background art technologies which are optionally implemented to support the management system of the present invention.
- the prefe ⁇ ed background art technologies which are desc ⁇ bed herein include XML, desc ⁇ bed in section 1: CORBA and a particular prop ⁇ etary embodiment of CORBA, desc ⁇ bed in section 2; and the Java Bean component architecture, desc ⁇ bed in section 3. Section 1 .XML
- XML XML with ActiveX TM objects as the front end, such that the information components of the present invention are preferably accessed through X.ML
- X.ML The acronym ".XML” stands for "Extensible Markup Language”
- XML is a document markup language which was designed to have greater functionality than HT.ML (hypertext markup language)
- Documents w ⁇ tten in HTML can, however, be converted into XML
- a document w ⁇ tten in XML is a collection of XML elements, which can be images or sections of text, for example
- the document itself features elements which are indicated with “tags” These elements have logical values
- Each element can also have "child elements", which are other elements to which a reference is made that element, known as the ' parent element"
- This element structure is a hierarchical tree which enables complex elements to be composed of multiple simpler elements
- Documents w ⁇ tten in XML optionally feature a DTD (Document Type Declaration), which is either included within the XML document, or alternatively is a separate but associated document
- the DTD contains the rules according to which the XML document should be interpreted, such as the declarations for the structures of the elements within the XML document
- the term "HTML” document will refer to a document w ⁇ tten in HTML
- the term "XML” document will refer to a document w ⁇ tten in XML
- the option of including the DTD both increases the flexibility of XML and enables documents w ⁇ tten in X.ML to be validated, in order to ensure that these XML documents conform to the rules in the DTD
- X.ML An additional useful feature of X.ML is the more powerful hnl ⁇ ng structure available
- the links of XML are compatible with those of HTML
- XML allows any type of element to act as a link
- the start or the finish of an XML link does not need to be located within one of the documents which is being linked.
- XML links could be located in a document which is entirely separate from the two linked documents This enables two documents to be more easily linked after both documents have been created, without alte ⁇ ng either document
- .XML links including simple and extended.
- a simple link is similar to the link of .HTML, in that it is unidirectional and has only one locator.
- extended link can have more than one locator, such that the extended link can "point" to more than one target resource.
- extended links can also be bidirectional or multidirectional. As noted previously, these extended links may be located m a separate file, external to the XML document, and can therefore be very difficult to manage. For example, when a linked file is deleted or otherwise removed, the extended link list is not amended, potentially leading to a broken link.
- XML links can also point to target resources which are fragments of a document.
- the bounda ⁇ es of these fragments can be determined through either static "chunl ⁇ ng" or dynamic "chunking".
- static chunking the XML file is manually divided into pieces, with a new XML file for each piece which is then linked to the mam XML file.
- dynamic chunking the XIV L file is not divided into new files Rather, separators are placed within the XML file to indicate bounda ⁇ es for chunks These separators can be used to define a portion of a document which is to be ret ⁇ eved, such that the fragments are separated and served "on the fly".
- dynamic chunl ⁇ ng has the disadvantage of significantly increasing server overhead as the server determines which fragment is to be served, such that the XML server may become overloaded.
- .XML documents also have style sheets, which feature construction rules descnbmg how each element should be displayed. For example, if the element is a paragraph of text, the construction rule may indicate font size and type, the extent of the indentation of the first line, spacing between lines of the paragraph and so forth
- both the information components of the present invention and the management system for these components are compliant with the CORBA (Common Object Request Broker Architecture) standard, which is a standard for communication between dist ⁇ aded objects established by OMG (Object Management Group).
- CORBA Common Object Request Broker Architecture
- OMG Object Management Group
- OMG is a consortium of over 700 different software developers.
- standards developed by OMG are industry-wide and software applications compliant with these standards should be able to successfully interact with other compliant applications, as desc ⁇ bed below.
- CORBA is a standard which provides a standard method for execution of program modules in a dist ⁇ aded environment, regardless of the computer programming language in which the modules are w ⁇ tten, or the computing platform on which they are executed.
- CORBA enables complex systems to built, integrating many different types of computing platforms within an entire business, for example.
- ORB Object Request Broker
- Each application is an "object” with a particular interface through which communication is enabled.
- ORB acts as the "middle-man", passing information and requests for service to each object as necessary.
- ORB permits true dist ⁇ ubbed computing, since different objects do not need to be operated by the same computer or even reside on the same network.
- the ORB directs any communication to the approp ⁇ ate server which contains the object, which might be located on the same host, or a different host, as the client object.
- the ORB redirects the results back to the client object.
- CORBA can also be desc ⁇ bed as an "object bus” because it is a communications interface through which objects are located and accessed.
- CORBA provides HOP (Intemet Inter-ORB Protocol), which is the CORBA message protocol for communication on the Internet.
- HOP links GIOP (CORBA's General Inter-ORB Protocol) to TCP.
- IP the general communication protocol of the Intemet.
- GIOP in turn specifies how one ORB communicates with another ORB.
- one type of prop ⁇ etary ORB can communicate with another, different type of prop ⁇ etary ORB on a different host computer according to a combination of IIOP and GIOP protocols Practically speabng, if IIOP is built into a Web browser such as NetscapeTM NavigatorTM, a Java applet is downloaded into the Web browser when the user accesses a Web page with a CORBA-compatible object. The Java applet invokes the ORB to first pass data to the object, then to execute the object and finally ret ⁇ eve the results. Further information on both CORBA and IIOP can be obtained from the "Tech Web Technology Encyclopedia" (http://www techweb.com/encyclopedia as of September 10, 1997)
- WRB Web Request Broker
- Oracle Corp. Redwood Shores, California, USA
- WRB is desc ⁇ bed in a white paper
- M. Anand et al "The Web Request Broker a Framework for Dist ⁇ ubbed Web-based Applications", http://www.olab.com/www6_l/paper.html as of September 10, 1
- B ⁇ efly the WRB architecture includes the dispatcher, application and system cart ⁇ dges, and a CORBA compliant ORB.
- the dispatcher and cart ⁇ dges use the ORB for communication between components, so that these components can be dist ⁇ ubbed on separate remote machines.
- the dispatcher routes requests from the .HTTP daemon to the approp ⁇ ate cartridge.
- the cart ⁇ dges are software components which perform a specific function and are thus the "objects" desc ⁇ bed previously Cart ⁇ dges are used within the system of the present invention as an exemplary support for a number of different functions, as desc ⁇ bed in subsequent sections.
- Cart ⁇ dges have a name, composed of the IP address of the server where the cartridge is located, and the virtual path to the location of the cart ⁇ dge on that server Cart ⁇ dges also have a standard interface, which includes a number of methods Examples of such methods include the authenticate routine, which determines whether the client is entitled to requested services and the exec routine, which receives the particular service request if the authentication routine is successfully performed
- the cart ⁇ dge technology provides a fully developed basis for the creation of particular software functionality
- prop ⁇ etary cart ⁇ dge technology for software development is that the system architecture provides a framework for interaction between different objects over the Internet by using HTTP Web servers and existing Web browsers
- the CORBA protocols only define a standard, but do not provide any specific implementation
- the prop ⁇ etary cart ⁇ dge technology enables one of ordinary skill in the art to develop a software application which can communicate with other applications over the World Wide Web
- Java Bean is a component software architecture which operates in the Java programming environment.
- Java is an interpreter-dnven, object-o ⁇ ented computer programming language which is substantially platform-independent.
- Software packages which are w ⁇ tten in Java can be operated by any operating system, or platform, which supports the Java interpreter.
- a Java Bean component can run remotely and independently as a discrete software application object in a dist ⁇ ubbed computing environment using either the Remote Method Invocation protocol of Sun Computers Inc , or else by using CORBA.
- information components are preferably packaged and then dist ⁇ ubbed as independent Java Bean components.
- the Java Bean component software architecture is a set of API's (Application Programming Interfaces) and rules which enable software developers to define software components to be dynamically combined to create a software application.
- the Java Bean component model has two major elements: components and containers
- Components range in size and capability from small GUI (graphic user interface) widgets such as a button, to an applet-sized functionality such as a tabular viewer, and even to a full-sized application such as an HTML (HyperText Mark-up Language) viewer or the information component of the present invention
- Components can have a visual aspect, such as a button, can actually be visual information or can be non-visual, such as a data-based monito ⁇ ng component
- Containers hold an assembly of related components.
- Containers provide the context for components to be arranged and interact with each other Containers are occasionally referred to as “forms", “pages”, “frames” or “shells”
- Containers can also be components, so that a container can be used as a component inside another container.
- the Java Bean component model provides the following major types of services: component interface exposure and discovery; component properties, event handling; persistence; application builder support and component packaging.
- Component interface exposure and discovery allows components to expose their interface so that they can be d ⁇ ven dynamically by calls and event notifications from other components or application sc ⁇ pts
- Component properties are the public att ⁇ butes of a component which either directly reflect or effect the current state of that component
- properties could include the "foreground color" of a video clip, its zoom factor or its access ⁇ ghts The state of these properties can be interrogated or modified through standard mechanisms.
- Event handling is the mechanism for components to "raise” or “broadcast” events and have those events delivered to the approp ⁇ ate component or components which need to be notified. Typically, notified components then perform a particular function in response. For example, if the user interface shows a document image clip on the monitor screen, the Parent Information Object event will communicate with the Object Server to transmit the full page of the clip, and will send a viewing command to the full-page viewer component.
- Event handling allows information components to interact with each other
- Persistence is the mechanism for sto ⁇ ng the state of a component in a non-volatile location. The component state is stored in the context of the container and in relation to other components. For example, if the user wants to save the viewing zoom factor for all of the following documents, the persistence mechanism would support this.
- Application builder support interfaces enable components to expose their properties and behaviors to application builder development tools. Using these interfaces, the tools can determine the properties and behaviors, or events and methods, of arbitrary components.
- the tools can provide mechanisms such as tool palettes, inspectors and editors, which the application developer uses to assemble an application Through these mechanisms, the application developer can modify the state and appearance of components as well as establish relationships between components.
- This mechanism enables sophisticated information applications such as HyperText links to be created.
- the user can define a button which appears on the viewed document, and then links the document to a different document
- the application developer will use property editors to specify the appearance, including size, color and label, of the button, the link type and the link target
- Java Bean components can be dist ⁇ aded ana independently deployed over a network, there is a need to provide a facility to physically "package" the resources which are included in an information component so that they are accessible to the other Java Bean components.
- packaging is performed with the JAR (Java Archive) file format
- JAR file format enables the class file of the information component and other information component resources such as images, OMS (object mapping structure), sounds, and link information, to be packaged as a single physical entity for dist ⁇ bution.
- Section 1 provides a general desc ⁇ ption of information components.
- Section 2 desc ⁇ bes the system of the present invention.
- Section 3 desc ⁇ bes the information component content capturer in more detail.
- Section 4 desc ⁇ bes the information component identifier
- Section 5 desc ⁇ bes the mformation component cont ⁇ butor
- Section 6 desc ⁇ bes the information component server.
- Section 7 desc ⁇ bes the information component publisher and client
- Section 1 Information Component Each information component has a number of different elements and properties.
- Each information component belongs to an information class.
- the information class defines the properties and operations of a group of information components.
- Information classes can desc ⁇ be a newspaper, a general document or a video clip, for example.
- Figures 1A-1C are illustrations of exemplary information components, each of which can be placed in different classes.
- Figure 1A is a general desc ⁇ ption of an exemplary document 10, showing the hierarchical structure.
- Document 10 is in turn subdivided into a number of page components 12, of which four are shown for the purposes of illustration only Page component 12 is a member of the page class, which stores properties related to the structure of a page of document 10.
- These prop- erties include textual information, structural information and any links to other components.
- the operations, or methods include ret ⁇ eving the textual information, for example. Thus, the operations are used to store, ret ⁇ eve or modify information contained in the properties of components which are members of the page class.
- Every information component which is a page component 12, and hence which belongs to the page class, may share certain repeated structural features. These features are also examples of information components, and are desc ⁇ bed as "shared information components" Every page component 12 can include these shared information components in order to maintain a uniform structure between pages, for example, and to decrease storage space for such repetitive features As shown in Figure 1A, these shared information components for page component 12 include a footer component 14, a header component 16 and a logo component 18 These are intended as examples only and are not meant to be limiting in any way.
- Header component 16 could be a title, such as "Document Report", or any other desired information
- Footer component 14 could feature a page number, a date, or any other desired information
- logo component 18 would be the logo for the particular company which is producing document 10, for example.
- Each page component 12 also includes one or more information components which are not shared at all. or which are only shared with certain other page components.
- page component 12 which is labeled "Page 1" includes a summary section component 20, which is a member of the summary class.
- the summary class could feature text and/or images which summa ⁇ ze an earlier portion of document 10, for example.
- summary section component 20 is only included withm page component 12.
- the "Page 1" page component 12 also features a “Chapter 1" component 22, which is shared by the “Page 2" and “Page 3” page components 12.
- the "Page 4" page component 12 features a “Chapter 3" component 24 Summary section component 20, "Chapter 1" component 22, and “Chapter 3” component 24 are all further subdivided into a plurality of paragraph components 26, which are members of the paragraph class.
- paragraph components 26 contain the information related to each paragraph, which may include text for example
- the text in each paragraph component 26 is contained within a text component 28 as shown, which belongs to the text class.
- paragraph component 26 can optionally include an image component 30 and a table component 32 as shown.
- table component 32 stores a table and belongs to the table class.
- information component p ⁇ mitives are examples of information component p ⁇ mitives.
- An information component p ⁇ mitive is the most basic unit of information components, such that the p ⁇ mitive is no longer divisible into information components which are lower in the hierarchical structure.
- information component p ⁇ mitives are preferably potentially able to be shared between information components.
- a table of data which is an example of table component 32, could be included both in summary section component 20 and "Chapter 1" component 22.
- shared information component p ⁇ mitives also only need to be stored once in order to be available to other information components.
- Figures IB and IC show portions of certain specific examples of information components, shown in terms of an exemplary class structure, it being understood that this is for the purposes of desc ⁇ ption only and is not meant to be limiting in any way.
- a newspaper information component belongs to a newspaper class 34, which defines the properties and operations of components which contain newspaper pages.
- Newspaper class 34 has an article class 36 for an individual newspaper article Article class 36 inhe ⁇ ts the properties of the parent class, newspaper class 34
- article class 36 may have additional properties and methods, such as the coordinates of the location of the article with the newspaper page, or an operation for ret ⁇ eving the name of the author of the article
- a column 38 is shown for a column, while an image class 40 is also shown for a picture
- image class 40 might have information about pictures which are associated with the article
- Column class 38 might contain information about the structure of the column which contains the article.
- Column class 38 and image class 40 are related to article class 36 according to a defined set of relationships.
- Figure IC shows an exemplary video clip information class 42 which contains information such as data and structure for a segment of recorded video.
- a video stream information class 44 is the highest level class for the hierarchy.
- a video clip information class 46 is next in the hierarchy, followed by a frame class 48.
- Frame class 48 might contain only information regarding a single frame of the video. Thus, even though a video may be considered as a sequential collection of images which give the illusion of movement, it too can be broken down into smaller elements which are then stored in the above-mentioned information classes.
- Figures 2 A and 2B show the general architecture of the system of the present invention.
- a general system architecture 50 includes IC Contributor 60, IC Server 62, IC Search Engine 63, and IC Publisher 65.
- IC Cont ⁇ butor 60 further features IC (Information Component) Content Capturer 52, IC Knowledge Base 54, IC Rules Editor 56, and IC Identifier 58.
- IC Content Capturer 52 is responsible for the acquisition and conversion of information content, and for the transmission of the converted information content to IC Identifier 54.
- IC Identifier 54 then identifies information components according to certain rules and to class information stored in IC Knowledge Base 54 Both the rules and the class information can be added, removed or otherwise altered with IC Rules Editor 56.
- the o ⁇ gmal document and the identified information components are transmitted from IC Cont ⁇ butor 60 to IC Server 62.
- IC Server 62 then stores and manages the actual or "o ⁇ ginal" information such as documents, multimedia objects and other types of information entities, as well as managing the information components themselves
- Information components are made available from IC Server 62 by a request through IC Search Engine 63, and are then published by IC Publisher 65
- the general system of the present invention collects the information from a va ⁇ ety of sources, packages the information into information components, and then stores the components for later ret ⁇ eval by a client application
- Section 3 Information Component Content Capturer This section desc ⁇ bes the IC (information component) Content Capturer, which is shown in Figure 2B, as part of IC Cont ⁇ butor 60
- IC Content Capturer 52 preferably operates as memory resident software and captures the desired information content from a va ⁇ ety of software systems including, but not limited to, a document editor 64 such as the Word product of MicrosoftTM, a media application 66 including, but not limited to, the AdobeTM AcrobatTM reader for reading PDF files from AdobeTM AcrobatTM, a facsimile machine software application 68 for operating a facsimile machine, and a Web browser software application 70 such as NetscapeTM NavigatorTM Additional software systems from which information content can be captured include imaging software and spreadsheet software These software systems are intended as illustrative examples only, since substantially any software system which handles, stores, ret ⁇ eves or manipulates information could have that information captured by IC Content Capturer 52.
- a document editor 64 such as the Word product of MicrosoftTM
- media application 66 including, but not limited to, the AdobeTM AcrobatTM reader for reading PDF files from AdobeTM AcrobatTM
- a facsimile machine software application 68 for operating a facs
- IC Content Capturer 52 invokes the approp ⁇ ate software d ⁇ vers for handling different information formats from the above software systems
- information could be captured from a document stored in the format of MicrosoftTM WordTM word processing software.
- a number of possible methods could be used to capture the information contained within the document, two illustrative examples of which are given here, it being understood that these are for discussion purposes only and are not meant to be limiting.
- IC Content Capturer 52 interacts with MicrosoftTM WordTM and instructs MicrosoftTM WordTM to place the document on the "clipboard".
- the "clipboard" is a feature of a number of different computer operating systems, in particular those operating systems of Microsoft Inc.
- clipboard refers to any feature of a computer operating system which enables information to be exchanged between two software applications.
- IC Content Capturer 52 captures the necessary information about the document through substantially direct interaction with the software system, such as MicrosoftTM WordTM. Such interaction can be performed according to a number of different methods. For example, MicrosoftTM WordTM enables other software applications to obtain this information through the creation of a "macro". Alternatively, IC Content Capturer 52 could include a printer driver, which would enable MicrosoftTM WordTM to "print" the document to IC Content Capturer 52 directly, or alternatively to a file in a format accessible by IC Content Capturer 52. In any case, regardless of the specific method employed, the content of the information is obtained from the captured information by using a particular software driver.
- Each software driver is relevant to the particular information source format, such as electronically scanned paper document, electronic document such as a word processing document, video clip, document sent by facsimile and other such formats.
- Each driver is a channel to an information processing unit for a specific type of information, and invokes a process specific to the source of that information.
- the content information is stored in an internal unified format for data processing and information component recognition, access and retrieval. The information in the unified internal file format is then sent to IC Identifier 58.
- Section 4 Information Component Identifier and Knowledge Base
- IC Identifier 58 automatically identifies and creates information components from the information passed from IC Content Capturer 52 according to rules stored in IC Knowledge Base 54.
- the information is first analyzed to extract the information component p ⁇ mitives, which as desc ⁇ bed previously form the most basic unit of information. These p ⁇ mitives include text, images, vector graphics and other such basic units of information.
- the information components themselves are constructed from the information component p ⁇ mitives, and the relationships between components are determined according to rules stored in IC Knowledge Base 54.
- the information components are classified, again according to rules stored in IC Knowledge Base 54. This classification determines secu ⁇ ty att ⁇ butes, indexing rules and other publishing parameters.
- the information is transferred to IC Server 62.
- Figure 3A shows a portion of IC Cont ⁇ butor 60 in more detail, focusing on those components which interact with IC Identifier 58.
- IC Identifier 58 has three layers, including a p ⁇ mitive identifier 64, a component constructor 66 and a component classifier 68.
- P ⁇ mitive identifier 64 examines the received information at two levels. First, the textual information is identified and separated into individual elements, according to the structure of the type of information The second level of examination of the received information is visual identification, which includes determining the visual att ⁇ butes and structure of the information. At the end of this dual level examination, the information component p ⁇ mitives have been identified according to rules stored in IC Knowledge Base 54.
- the information component is then constructed from one or more information component p ⁇ mitives and/or from one or more information components which are lower in the hierarchy, by component constructor 66
- the information component includes such information as the identity of the p ⁇ m ⁇ t ⁇ ve(s) or lower information component(s) from which it is constructed.
- the relationships between components are determined by component constructor 66 according to rules stored in IC Knowledge Base 54. An illustrative example of this process is disclosed U.S. Application No.
- the disclosed process includes the following steps. First, the document is converted into a digital raster format, for example by scanning a paper document, which is stored in an electronic file. This step is preferably performed by IC Content Capturer 52. Next, preferably the converted document is enhanced to improve the quality of the image, for example.
- the enhanced raster format file is converted into two electronic files, collectively called a "binary/raster file".
- the first file has the enhanced raster format
- the second file has pointers to the enhanced raster format file. Every data element in the raster format file, such as textual information or an entire graphic image, could have a co ⁇ esponding pointer in the second file.
- the two files are preferably produced, at least in part, by an automatic text recognition process such as OCR, which enables the image of the text to be realized as textual data
- OCR automatic text recognition process
- the information is then stored as information components composed of information p ⁇ mitives. as previously desc ⁇ bed
- indices for information ret ⁇ eval are created.
- the o ⁇ ginal document has been subdivided and stored as a collection of information components.
- These information elements preferably include a raster image of the document, a pointer to the storage location of the o ⁇ ginal document, any text contained within the document and the coordinates of the words of the text with the document. More specifically, the coordinates preferably include all information which is necessary to geographically locate the word within the document, such as the number of the page on which the word falls, the number of the word on the page and the coordinates of the rectangle which bounds the word on the page, or "bounding rectangle" The bounding rectangle determines the area occupied by the word on a page and is necessary to fully reproduce the visual aspects of that word Thus, the coordinates of each word nume ⁇ cally desc ⁇ be the visual appearance of the word
- IC Content Capturer 52 performs OCR (Optical Character Recognition) to obtain the textual information from the image stored in the electronic file by converting the image of a letter into the letter itself Both the text itself and the coordinates of individual words are then available.
- OCR Optical Character Recognition
- Other examples of such processes include pattern recognition and PDF conversion. It should be noted that these processes are already well known in the art for the creation and manipulation of information in a particular information source format.
- the information elements which are produced are then identified according to the type of information component p ⁇ mitive which they represent, which is in turn determined according to rules in IC Knowledge Base 54 For example, every individual image identified in the steps above would be determined to be an image information component p ⁇ mitive. Similarly, the text extracted in the steps above would be determined to be a text information component p ⁇ mitive according to information stored in the textual database. Other information component p ⁇ mitives could also be identified from the collection of information elements After the information component primitives have been identified, the primitives are used to construct information components, according to rules stored in IC Knowledge Base 54. For example, in document component 10 of Figure 1A, the information primitives include image infoimation component primitive 30 and text information component primitive 28. These primitives are in turn used to build paragraph 26.
- Paragraph 26 now contains information concerning not only the inclusion of one or more image information component primitives 30, for example, but also such infoimation as the relative geometrical location of the p ⁇ mitive within paragraph 26.
- the geometrical location of the primitive was determined when the primitive itself was identified, for example as described above.
- the primitives are first assembled in information components which are relatively lower in the hierarchy, for example paragraph 26, and then these components are in turn assembled into information components which are higher in the hierarchy.
- each individual information component is classified according to rules in IC Knowledge Base 54 by component classifier 68.
- the individual component is compared to components listed within the knowledge base, and is recognized as a unique and individual element belonging to a larger information cluster.
- Each component is classified first by assignment to a primary information class, and then by placement within the hierarchical structure of information sub-classes belonging to that primary class.
- FIG. 3B shows a schematic block diagram of an exemplary IC Knowledge Base 54
- the document class for the information component is determined.
- the different document classes are stored within IC Knowledge Base 54 in a document class table 70.
- the document could be a research report, newspaper, or substantially any other type of document which has been placed within document class table 70.
- the information component would be classified according to a particular information component class stored in an IC class table 72.
- These classes could include, but are not limited to, a logo, a main title, publishing information, summary, and so forth.
- Each class is in turn identified according to rules stored in a rules table 74.
- These rules are composed of tokens, including constants 76, functions 78 and operations 80.
- Each rule could optionally be stored in a "flat (text) file" for example, in which case the tokens would preferably be stored as text strings separated by spaces. Of course, many other options are also available for storing these rules.
- Each rule preferably includes the following tokens in the following order, although of course other rule structures could be used: the name of the IC class, the hierarchy level of the information component, the font type, the size range, the color, the case of the letter, the location of the page on which the information component is found and the text which the information component should contain (if any)
- the rule does not necessa ⁇ ly need to include all of these tokens, an absent token can be indicated by a place-holding character such as a "slash" ("/”), for example.
- PageNo Helvetica-Bold 11.0-11.05 / / B /
- an information component named PageNo, is identified by any text of any color in any letter case, in font Helvetica-Bold, any size between 11 and 11.05, which is located at the bottom of the page (indicated by the letter "B").
- an information component named Section, is identified by any text of any color in any letter case, in font TimesNewRoman or TimesNewRoman-Bold, any size between 9.50 and 10.50, which is located anywhere on the page.
- IC Rules Editor 56 is preferably a GUI (graphical user interface), which more preferably allows the user to define new rules, enter new information, delete old rules or information, and amend or alter rules or information.
- Section 5 Information Component Cont ⁇ butor
- IC Cont ⁇ butor 60 also prepares the information components for publication and for storage in a database, such that the information components can be served to a client by IC Server 62.
- IC Cont ⁇ butor 60 features a component generator 82.
- Component generator 82 transforms the classified information component into a standard format including, but not limited to, an active object format such as a COM object or a Java Bean object, or a flat file format such as D.HTML. Generally, component generator 82 packages the classified information component according to the standard format, so that the packaged information component is accessible by IC Server 62.
- desc ⁇ ptions of the transformation of the information component into two of these object- o ⁇ ented formats are given in further detail below.
- Section 2 a desc ⁇ ption is provided of the transformation of the information component into an object in an XML environment.
- Section 1. a desc ⁇ ption is provided of the transformation of the information component into a Java Bean object in a CORBA environment.
- IC Cont ⁇ butor 60 is also able to render and to store information components as a D.HTML document.
- IC Cont ⁇ butor 60 converts data for each p ⁇ mitive of each information component into equivalent fragments in DHTML format.
- the graphic elements are converted to raster images (in GIF fo ⁇ mat).
- the text elements are converted to a set of DHTML ⁇ DIN> blocks.
- DHTML ⁇ DIV> blocks There are two types of DHTML ⁇ DIV> blocks: a style block and a value block.
- style block defines the style att ⁇ butes of the text, such as font size and name, font-weight, font-style and color.
- value block defines the position of the text element within the current p ⁇ mitive and its text value When the text value contains more than one word, the text value is inserted into a ⁇ NOBR> block to prevent line breaking for the given text element by the web browser
- the DHTML fragment is optimized to ensure that each "style block" with specific characte ⁇ stics appears only once in DHTML fragment for the p ⁇ mitive.
- the o ⁇ ginal fonts are preferably substituted by the fonts available for the Web browser with possible modifications of font size.
- IC Server 62 can then serve the information component to IC Search Engine 63 after receiving a request for a particular information component from IC Search Engine 63.
- IC Search Engine 63 receives a request for an information component, which is then made available to IC Publisher 65, which publishes the information component.
- IC Publisher 65 publishes the information component to a Web page, for example, or onto paper, as another example.
- Both IC Search Engine 63 and IC Server 62 must be able to communicate with each other, such that an information component can be requested. This communication is permitted with information components which have a standard format.
- An object format is particularly preferred, because objects can be accessed through a predefined structure, which is more efficient for interacting with the information contents of the object.
- Both of the exemplary and preferred embodiments desc ⁇ bed below in Chapter IH, Sections 1 and 2 (XML) and in Chapter IV, Section 1 (Java Bean) have object formats for the information component. Of course, other types of formats could be used, such as DHTML.
- Section 6 Information Component Server
- IC Information Component
- IC server 62 stores and manages the "original" information, such as documents, video segments, sounds and so forth.
- IC Server 62 locates the o ⁇ ginal mformation entity, isolates the corresponding information component and then returns the information component to the client application in some suitable format, for example as an HTML file.
- Section 7 Information Component Publisher and Client
- an IC Client 98 is able to send requests for information components to IC Server 62 through an IC Search Engine 63.
- IC Client 98 is also able to receive such components from IC Server 62, optionally and preferably through the CORBA ORB.
- IC Client 98 preferably features some type of GUI (graphical user interface), which enables client applications to interact with the functionality of the information management system of the present invention.
- IC Publisher 65 is then able to publish the information component onto IC Client 98.
- GUI interface 100 the ability to access certain information components and to view these components on GUI interface 100 is controlled by two functions: automatic information component replacement and "white-label". These features provide customized views of the same documents to different user groups, while preventing the display of sensitive information components to specific users or to groups of users.
- the IC replacement table includes the following information: the class and the name of the information component to be replaced, the class and the name of the information component which is to replace it, and the user ' s groups or individuals for whom the replacement should be performed.
- the logo on a particular research report which is an information component called "Big Company X Report”
- IC Server 62 would then replace the logo of "Big Company X" with the logo of "Another Big Company” when those clients of "Another Big Company” request the Report.
- the white-label function is used to specify one or more information components in the o ⁇ ginal document which are not to be displayed on GUI interface 100, but which remain incorporated within the o ⁇ ginal document.
- the white-label function enables sensitive information to be protected from access through IC client 98
- the objects are preferably compatible with the ActiveXTM architecture, although other types of objects could also be used, as long as they were compatible with the XML environment.
- the ActiveXTM objects could be constructed by the client from the information component objects according to the ActiveXTM architecture
- Section 1 is an overview of the system when implemented with XML.
- Section 2 is a desc ⁇ ption of IC Cont ⁇ butor when implemented with XML
- Section 3 desc ⁇ bes IC XML server.
- Section 4 desc ⁇ bes the IC Search Engine when implemented with XML.
- Section 5 desc ⁇ bes the IC Publisher and IC Client when implemented with XML.
- Figure 7 shows an overall view of a portion of the system of the present invention as implemented in XML.
- IC Cont ⁇ butor 60 is now IC XML Cont ⁇ butor 200 and
- IC Server 62 is now IC XML Server 202 .
- IC XML Cont ⁇ butor 200 creates objects from the information components as XML-en vironment compatible objects.
- IC XML Server 202 provides access to a database 204, which is similar to database 84 of Chapter ⁇ Database 204 stores the information components, which are implemented as XML-environment compatible objects.
- IC X.ML Server 202 also communicates with a DOM (document object model) compliant interface, referred to as DOM Interface 208.
- DOM Interface 208 Software programs which are compliant with the DOM protocol are able to communicate with other software programs for XML-compatible or XML-specific tools, such as Web browsers or software programs for editing XML documents, for example
- DOM Interface 208 acts as a gateway, enabling these XML tool software programs to communicate with information components through IC XML Server 202.
- XML tool software programs can therefore preferably edit and reuse information components directly from database 204, without conversion of the components.
- IC XML Server 202 provides one or more information components upon receiving a request from IC Universal Search Engine Adapter 214.
- IC Universal Search Engine Adapter 214 enables many different types of search engines to communicate with IC XML Server 202, such that a search can be made for specific information components with database 204.
- IC Universal Search Engine Adapter 214 also preferably controls access to IC XML Server 202, preferably including such functions as secu ⁇ ty and request access.
- IC Universal Search Engine Adapter 214 passes a request for an information component to IC XML Server 202, which then returns the desired information component.
- IC XML Publisher 210 optionally and preferably includes an .HTML rende ⁇ ng engine 212, and a standard document rende ⁇ ng engine 216.
- the information component can then be displayed to the user in a number of ways, such as by p ⁇ nting the information on paper or by displaying the information on a Web page
- IC XML Publisher 210 passes the information component to standard document rende ⁇ ng engine 216 If the information is to be displayed by a Web browser which can only handle HTML documents, then IC XML Publisher 210 passes the information component to HT.ML rende ⁇ ng engine 212. Other types of rende ⁇ ng are also possible of course. The desc ⁇ ption of each of these parts of the system is given in greater detail in the sections below
- This section desc ⁇ bes a specific, preferred implementation of the IC Cont ⁇ butor for operation with X.ML-env ⁇ ronment compatible objects such as those compatible with ActiveXTM architecture, IC X.ML Cont ⁇ butor 200.
- X.ML-environment compatible objects infoimation components are organized in a hierarchical structure and linked to each other.
- Each XML-environment compatible object has methods, properties and data.
- the data itself is the classified information component obtained as desc ⁇ bed in Chapter ⁇ . Methods determine the ways in which the data and properties of the information component can be manipulated. For example, methods include ways to access the data, whether as an image, a video clip, a sound and so forth.
- Methods also include an application interface, so that another application would be able to interact with the information component and with the stored data, and with a GUI (graphical user interface).
- Other methods pertain to access control and to event handling. Event handling enables these objects to broadcast events and to have those events delivered to an approp ⁇ ate component or components for notification Thus, event handling provides methods for communication between components packaged as XIvIL-environment compatible objects
- the properties of the XML-environment compatible object include the internal structure of the object and the location of the data of the information component withm the hierarchical structure of information components.
- information components are composed of IC p ⁇ mitives, which are in turn used to build more complex structures which desc ⁇ be the relationships between information components
- the location of the data of the information component with a hierarchy is important in order to be able to construct virtual documents and to understand the type and significance of the data within the mformation component.
- these properties include the correct tags for the type of data within the object, in order for the object to be correctly rendered within the XML environment, and its location within the information component hierarchy. For example, if the type of data is a chapter of a book, then the correct tag might be the "chapter" tag. This tag identifies the type of XML element for the object, which is important for the later assembly of the data withm the object as an element of an XML document.
- IC XML Cont ⁇ butor 200 packages the information component obtained from IC Identifier 58 into the XML-environment compatible object as follows. First, the data of the information component forms the data of the object. Next, the methods which can be used to interact with the object are determined. Certain of these methods are typical for all such objects. Other methods, such as the method for accessing the type of data with the object, are particular for the type of data from the information component. Finally, the properties of the object are determined, for example according to the location of the data of the XML- environment compatible object within the information component hierarchy.
- IC Server 62 This section desc ⁇ bes a specific implementation of IC Server 62, desc ⁇ bed in Chapter U, Section 6, for operation with .XML-environment compatible objects.
- IC .XML Server 202 accepts requests for and then serves information components as XML elements assembled into an XML document.
- IC XML Server 202 manages the extended links of XML and normalizes the structure of va ⁇ ous DTD's for the X.ML documents.
- IC .XML Server 202 enables the XML-environment compatible objects to be accessed by XML tool software programs without requi ⁇ ng conversion of the objects
- XML documents are collections of one or more XML elements which are organized according to certain rules, which are held in the DTD of the XML document.
- IC XML Server 202 is preferably able to assemble XML documents "on the fly" in response to a request from a client application.
- a client application might request a particular chapter of a book. This chapter could contain a chapter title, text and images, for example. The chapter could also be further subdivided into sections, each of which would also have an organizational structure
- the data required to assemble the chapter is contained within one or more IC XML elements.
- IC .XML Server 202 must first locate all of the IC .XML element(s) which are required for the chapter
- a style sheet is optionally selected for the XML document
- the style sheet is optionally determined by the properties of the "chapter" IC XML element, which may indicate a particular style sheet to be used for that element.
- the style sheet could be determined according to specifications submitted by the client application, such that the preferences of the client application determine the style sheet.
- the IC XML elements are then assembled in the XML document, optionally according to the style sheet
- the DTD for the XML document is then constn ⁇ cted, according to the tags contained withm the IC XML element
- the links for the XML document are then determined.
- these links are extended links. More preferably, the extended links are managed as part of a document which is external to the XML document
- These links are determined according to the hnk(s) of the IC XML element, which are included in the properties of the element. For example, one such link might link two sections of the chapter.
- this other XML element is also assembled into a different XML document, such that the different XML document could also be served if necessary.
- extended links are also objects which are stored externally, for example m database 204 Extended link objects are exposed as child objects of the IC XML- environment compatible objects, or resource objects, which they link.
- each extended link object is preferably stored in a link table 218, which is then stored in database 204
- the identity of each IC XML-en vironment compatible object is also stored in an IC table 220, also stored in database 204.
- a document table 222 is also stored in database 204
- Document table 222 indicates how to assemble complete XIVEL documents
- these .XML documents can be assembled into a format which closely resembles the format of the o ⁇ ginal document from which the information was obtained
- other "virtual" XML documents could also be assembled according to requests received from the client application.
- IC XML Server 202 manages the extended link objects through dynamic management, by dynamically generating extended link objects as required. For example, if an document or an information component is removed from database 204, IC XML Server 202 updates link table 218, IC table 220 and document table 222.
- XML Server 202 updates link table 218, IC table 220 and document table 222 as necessary More preferably, IC XML Server 202 sends an alert to the software tool which is attempting to remove the document or information component from database 204, alerting the user to the possible alteration to the link structure.
- IC XML Publisher 210 is desc ⁇ bed in greater detail in Section 4 below.
- IC XML Server 202 is preferably capable of serving many different types of XML documents, which may have different DTD structures. Such different structures can increase the difficulty of searching, ret ⁇ eving and assembling IC XML elements Furthermore, if IC X.ML elements have different names for tags which should indicate the same element, IC XML Server 202 may not be able to assemble IC XML elements correctly Therefore, IC XML Server 202 optionally and preferably performs DTD normalization for the XML elements and documents.
- DTD normahzer 232 compares the name of the tag (text st ⁇ ng associated with the tag) to the rule or rules of a DTD rules database 234. For example, if the name of tag is "summary", a rule might state that "synopsis" should be used to replace "summary".
- DTD rules database 234 does not have a rule for the name of that particular tag, then DTD normahzer 232 searches any information associated with the .XML element having that tag in order to normalize the name of the tag
- XML tool software programs such as editor programs for XML documents
- IC XML Server 202 is able to communicate with IC XML Server 202 through this "gateway" software module
- these editor programs are able to create and manipulate virtual documents from XML-environment compatible objects stored in database 204 in a substantially similar manner to the way in which XML documents are created and manipulated
- IC Universal Search Engine Adapter 214 passes requests for information components to IC XML Server 202.
- IC Universal Search Engine Adapter 214 therefore controls access to IC XML Server 202, and hence to the information components.
- IC Universal Search Engine Adapter 214 preferably operates according to an HTTP- based protocol.
- the access offered through IC Universal Search Engine Adapter 214 can be determined according to a software module or applet w ⁇ tten in Javasc ⁇ pt, Java, Active-XTM or C++ for example.
- IC Universal Search Engine Adapter 214 is preferably able to translate substantially any type of search query language into a format which is accessible to IC X.ML Server 202 More preferably, IC Universal Search Engine Adapter 214 includes a d ⁇ ver (not shown) for each search engine, such that a new type of search engine can be easily accommodated by alte ⁇ ng the d ⁇ ver
- IC Universal Search Engine Adapter 214 is preferably built to be compatible with the particular architecture of IC XML Server 202, such that the client application requesting a particular information component would not need to be altered in order to be compatible with different search engines Section 5 Specific Implementation of IC Publisher
- IC Publisher 63 is IC XML Publisher 210
- IC XML Publisher 210 makes the information components accessible to the client application The information component can then be displayed to the user in a number of ways, such as by p ⁇ nting the information on paper or by displaying the information on a Web page
- IC X.ML Publisher 210 passes the information component to standard document rende ⁇ ng engine 216
- Standard document rende ⁇ ng engine 216 could output the information component according to the PostSc ⁇ pt protocol for example, in order to allow data exchange and communication with paper p ⁇ nting devices
- IC XML Publisher 210 preferably passes the mformation component to .HTML rende ⁇ ng engine 212
- Other types of rende ⁇ ng are also possible of course
- HTML rende ⁇ ng engine 212 is able to render the XML document as an HTML or a DHTML document for being served to a Web browser
- HTML rende ⁇ ng engine 212 is able to render other document formats, such as PDF, word processing and image formats, as HTML documents as well PDF could be rendered from a PostSc ⁇ pt output for example
- the functions desc ⁇ bed for HTML rende ⁇ ng engine 212 could be used for rende ⁇ ng substantially any type of file in substantially any format as an HTML or DHTML document, as desc ⁇ bed below
- Figure 10 is a schematic, block diagram showing a preferred implementation of
- HTML rende ⁇ ng engine 212 and associated items according to the present invention.
- HT.ML rende ⁇ ng engine 212 interacts between a native file format processor 236 and a Web browser 238 Essentially, HTML rende ⁇ ng engine 212 enables a native file format document 240, which would normally be substantially accessible only to native file format processor 236, to be displayed by Web browser 238. Furthermore, the display of native file format document 240 by Web browser 238 is visually similar or identical to the display of native file format document 240 by native file format processor 236, as enabled by .HTML rende ⁇ ng engine 212.
- Native file format processor 236 can be any software component or application which can access a native file format document 240.
- Examples of such software components or applications include, but are not limited to, an XML editing software program, word- processing software such as MicrosoftTM WordTM and exchange format software such as Adobe AcrobatTM Exchange Reader.
- Examples of native file formats include, but are not limited to, the XIvEL format, the DOC format for MicrosoftTM WordTM and the PDF format for Adobe AcrobatTM
- the phrase "access a native file format document" is meant to connote that native file format processor 236 can display and manipulate native file format document 240 such that native file format document 240 is viewable with native visual att ⁇ butes or visual appearance.
- native file format document 240 is preferably in the file format which is intended to be implemented by native file format processor 236.
- HTML rende ⁇ ng engine 212 is able to convert native file format document 240 into a raster image in a raster format which is displayable by Web browser 238, according to one of two preferred embodiments of the present invention. In the first embodiment, HTML rende ⁇ ng engine 212 interacts with native file format processor 236 to obtain data regarding native file format document 240. In the second prefe ⁇ ed embodiment, HTML rende ⁇ ng engine 212 directly accesses native file format document 240 substantially without any interaction with native file format processor 236
- HTML rende ⁇ ng engine 212 interacts with native file format processor 236 and instructs native file format processor 236 to place native file format document 240 on the "clipboard" (not shown).
- the "clipboard” is a feature of a number of different computer operating systems, in particular those operating systems of Microsoft Inc. (Seattle, Washington, USA), such as "W ⁇ ndows95TM” and "Windows NTTM", for example.
- the general function of the "clipboard” is to enable one software application, such as native file format processor 236, to make information available to another software application, such as HTML rendering engine 212.
- clipboard refers to any feature of a computer operating system which enables information to be exchanged between two software applications.
- native file format document 240 Once native file format document 240 has been copied to, or placed on, the clipboard, native file format document 240 is then pasted to HTML rendering engine 212 as a graphical image.
- HTML rendering engine 212 imports, or accesses, native file format document 240 as an image, which can then be converted to a raster image in a raster format. Additionally, HTML rendering engine 212 is able to obtain the necessary data about native file format document 240 through such "pasting".
- HTML rendering engine 212 receives the necessary information about native file format document 240 through substantially direct interaction with native file format processor 236. Such interaction can be performed according to a number of different methods. For example, Adobe AcrobatTM allows other software applications to obtain this information through the creation of a "plug-in”. MicrosoftTM WordTM enables other software applications to obtain this information through the creation of a "macro".
- HTML rendering engine 212 could include a printer driver, which would enable native file format processor 236 to "print" native file format document 240 to an image format file. Such "printing” would also give HTML rendering engine 212 the necessary data about native file format document 240. Thus, HTML rendering engine 212 would obtain the necessary data about native file format document 240 through interaction with native file format processor 236.
- the second preferred embodiment of the present invention has many different possible implementations, one illustrative example of which is given here, it being understood that this example is for discussion purposes only and is not meant to be limiting.
- the second preferred embodiment involves direct interaction of HIML rendering engine 212 with native file format document 240, substantially without any interaction with native file format processor 236.
- HTML rendering engine 212 preferably performs such interaction by understanding all or substantially all of the instructions contained within native file format document 240, in a similar or identical manner as native file format processor 236. These instructions are like any another computer software language, and as such can be understood and interpreted by software applications other than native file format processor 236.
- HTML rendering engine 212 obtains the necessary data about native file format document 240.
- This data includes substantially all of the words of the text in native file format document 240, or at least of that portion of native file format document 240 which is to be displayed on Web browser 238.
- the data includes the coordinates of each word within native file format document 240.
- the data preferably includes all attributes of each word and of the relationships between words, such as the font style and size, character attributes such as bold or italicized text, and spaces between characters and words.
- the data in combination enable native file format document 240 to be reproduced in a substantially identical or identical document appearance on Web browser 238.
- the coordinates preferably include all information which is necessary to geographically locate the word within native file format document 240, such as the number of the page on which the word falls, the number of the word on the page and the coordinates of the rectangle which bounds the word on the page, or "bounding rectangle".
- the bounding rectangle determines the area occupied by the word on a page and is necessary to fully reproduce the visual aspects of that word.
- the coordinates of each word numerically describe the visual appearance of the word and, preferably in combination with the visual attributes of the word, enable the visual appearance of the word to be reproduced.
- HTML rendering engine 212 creates the raster image in a raster format which is displayable by Web browser 238.
- the raster image is created from the data obtained from native file format processor 236, and preserves substantially all of the visual attributes of native file format document 240, or a portion thereof, when seen in the native document appearance.
- the raster format is supported by Web browsers.
- One example of such a format is the GIF raster format.
- the raster image, containing at least a portion of native file format document 240 is displayable by Web browsers.
- the raster image is optionally created "on the fly".
- a raster image could be stored in an additional database 242 containing cached raster images, rather than being created "on the fly”.
- the raster image is produced as a result of a search request by the user, then preferably at least one "match" or search result is displayed in the context of at least a portion of at least one native file format document 240 containing the match, as shown in Figure 11.
- FIG 11 shows an exemplary, illustrative depiction of a portion of the computer monitor screen which is displaying the raster images of two matches.
- a monitor screen 244 is displaying a portion of the graphic output of Web browser 238, here shown as Netscape NavigatorTM although substantially any Web browser could be used.
- a command area 246 enables the user to enter commands to .HTML rende ⁇ ng engine 212 through Web browser 238.
- a display area 248 shows a portion of the results from the search.
- Display area 248 shows a portion of two documents 250 with the searched term, "Keppel", emphasized, in this example by a box.
- both graphic images and text are displayable in display area 248.
- the results of the search are displayed by Web browser 238, preferably within the context of at least a portion of the o ⁇ ginal document, as shown.
- the list of matches includes a plurality of matches within a single native file format document 240, a single match from a plurality of native file format documents 240, or even a plurality of matches from a plurality of native file format documents 240
- Web browser 238 can then request the next match the se ⁇ es of matches, or else the previous match in the se ⁇ es
- HTML rende ⁇ ng engine 212 then creates the raster image of the desired match in the se ⁇ es.
- a raster image could be stored in database 242, rather than being created "on the fly”.
- the raster image of the desired match is transferred to, and then displayed by, Web browser 238
- HTML rende ⁇ ng engine 212 is able to render information components as a DHTML document.
- HTML rende ⁇ ng engine 212 converts data for each p ⁇ mitive of each information component into equivalent fragments in DHTML format.
- data elements in the source information component graphic elements and text elements.
- the graphic elements are converted to raster images (in GIF format).
- the text elements are converted to a set of DHTML ⁇ DIV> blocks
- DHTML ⁇ DIV> blocks There are two types of DHTML ⁇ DIV> blocks, a style block and a value block.
- style block defines the style att ⁇ butes of the text, such as font size and name, font-weight, font-style and color.
- value block defines the position of the text element within the current p ⁇ mitive and its text value. When the text value contains more than one word, the text value is inserted into a ⁇ NOBR> block to prevent line breal ⁇ ng for the given text element by the web browser.
- the DHTML fragment is optimized to ensure that each "style block" with specific characte ⁇ stics appears only once in DHTML fragment for the p ⁇ mitive.
- Exact correspondence between the source document text style and the DHTML style is not always possible.
- the o ⁇ gmal fonts are preferably substituted by the fonts available for the Web browser with possible modifications of font size.
- DHTML representations can be created for complete pages as well as for parts of any page.
- the relevant p ⁇ mitives are obtained. These include the DHTML data, the enclosing rectangle for the p ⁇ mitive, and text coordinate mapping (for drawing search "hits" or results, or for otherwise highlighting or emphasizing a portion of text).
- HTML rende ⁇ ng engine 212 iterates over these p ⁇ mitives and for each one generates the DHTML code that locates it in the proper place on the page.
- Graphical elements are handled by creating a combination of a ⁇ DIV> and ⁇ IMG> tags which point to a URL for loading the images directly
- the search hits are also displayable as part of a DHTML view.
- Hits are created by adding (p ⁇ or to the p ⁇ mitive DHT.ML) ⁇ DIV> and ⁇ MG> tags which point to the URL of a pre-defined small image containing the hit color.
- the hits can be indicated by using one of 3 colo ⁇ ng methods These include marking the word; marking the beginning of the line; and marl ing the entire line.
- the size of the colo ⁇ ng which indicates the hit can be adjusted to the proper size by using the text coordinate mapping of the p ⁇ mitive.
- Section 5 Chapter II above desc ⁇ bed the general implementation of IC Cont ⁇ butor.
- This section desc ⁇ bes a specific, prefe ⁇ ed implementation of IC Cont ⁇ butor for operation with Java Bean objects.
- the Java Bean object has two groups of characte ⁇ stics: properties and methods Properties are desc ⁇ ptive features of the Java Bean object.
- Such features preferably include the OMS (Object Mapping Structure) which is the text, structure, graphics and APPS intelligence of the Java Bean object
- OMS Object Mapping Structure
- APPS intelligence is applicable only if the o ⁇ ginal document was a paper document scanned into an electronic file, since APPS stands for "Adaptive Probability Pattern Search", which enables text to be searched in an image even if not correctly recognized by the OCR process desc ⁇ bed previously
- the OMS contains information related to the overall structure of the Java Bean object, as well as a desc ⁇ ption of the relationships between different portions of that object.
- the profile information is also included
- the profile information includes any additional desired characte ⁇ stics of the o ⁇ ginal document These characte ⁇ stics are determined by the user through IC Content Capturer 52.
- the profile information could include data concerning the type of company which published the ongmal document
- the profile information is external to the o ⁇ ginal document and is added according to the specification of IC Content Capturer 52
- Other preferred properties include an optional but preferable object image, which is a visual image of the o ⁇ ginal document
- Another preferred property is hyperlink information, which desc ⁇ bes all connections to locations on the World Wide Web
- a desc ⁇ ption of the relationships between this component and other components is also provided
- secu ⁇ ty and access control data is provided, which determines who is allowed to access the information
- Methods determine the ways in which the data and properties of the information component can be manipulated These methods are standard for the Java Bean component architecture For example, methods include ways to access the data, whether as an image, a video clip, a sound and so forth Methods also include an application interface, so that another application would be able to interact with the information component and with the stored data, and with a GUI (graphical user interface) Other methods pertain to access control and to event handling Event handling, as noted previously in section 1, is the mechanism for Java Bean components to broadcast events and to have those events delivered to an approp ⁇ ate component or components for notification Thus, event handling provides methods for communication between components packaged as Java Beans.
- the information component is preferably packaged as a Java Bean by using the JAR file format.
- the JAR file format includes such information as the class file, images, sounds and links to other components.
- the class file is a desc ⁇ ption of the information class to which the information component belongs.
- Each such piece of mformation is stored m the JAR file format as a pointer to the storage location to the relevant data, such as an image for example
- the JAR file format wraps additional information and data around the information component, in such a way that all of the information and data is both presented as a single, independent entity, yet is readily accessible to other software objects.
- This section desc ⁇ bes a specific implementation of IC Server 62, desc ⁇ bed in Chapter ⁇ , Section 6, as IC JBC Server 300 for operation with Java Bean objects in a CORBA environment.
- IC JBC Server 300 locates the o ⁇ ginal information entity, isolates the corresponding information component according to a pointer stored in the Java Bean component for example, and then creates an "object image clip". The object image clip is then sent back to the client application as an HTML file.
- IC JBC Server 300 includes a database 302.
- Database 302 is both accessible to, and is managed by, an IC Manager 304
- IC Manager 304 is responsible for supplying the mam CORBA services, as desc ⁇ bed in Section 1.
- IC Manager 304 provides these services by being adapted to the main prop ⁇ etary ORB models which are available, such as the "Cart ⁇ dge” model of Oracle Corp. (California, USA) or the "Blade” model of InformixTM.
- An ORB is an Object Request Broker, preferably a WRB, or ORB for the "Cart ⁇ dge” model which is able to communicate with individual cart ⁇ dges.
- the main CORBA services include database and indexing services for search and ret ⁇ eval engines, and for push applications, database navigation services, dist ⁇ ubbed viewing, imaging and p ⁇ nting services for the information components, network control and ret ⁇ eval services: and dist ⁇ ubbed storage services for information components.
- database and indexing services for search and ret ⁇ eval engines, and for push applications
- database navigation services for dist ⁇ aded viewing, imaging and p ⁇ nting services for the information components
- network control and ret ⁇ eval services network control and ret ⁇ eval services: and dist ⁇ mped storage services for information components.
- IC Manager 304 is adapted to the "Cart ⁇ dge" model, then components are accessed from database 302 through one of a number of cart ⁇ dges, shown as at least one cart ⁇ dge 306.
- Each cart ⁇ dge 306 is a module of software which performs a specific function
- Each of the previously desc ⁇ bed services performed by IC Manager 304 is provided by a separate cart ⁇ dge 306
- Different cart ⁇ dges 306 could provide database indexing, database navigation and information component ret ⁇ eval services for example, without requi ⁇ ng cart ⁇ dge 306 and database 302 to be on the same server computer.
- IC JBC Server 300 is not necessa ⁇ ly a single server computer, but rather is an interacting collection of components which together form IC JBC Server 300.
- Cart ⁇ dges 306 would communicate with each other and with any databases through an ORB
- One advantage of the "Cart ⁇ dge” model is that communication between different computers could occur through the World Wide Web, via an HTTP daemon as desc ⁇ bed in section 1
- Cart ⁇ dges 306 are named with a combination of the IP address of the server where cart ⁇ dge 306 is located and the virtual path to the location of cart ⁇ dge 306 on that server.
- IC Manager 304 would preferably be composed of a number of different cart ⁇ dges 306, on one server computer or a plurality of server computers, which preferably interact with each other and any other necessary components, such as databases, through the World Wide Web.
- IC JBC Server 300 also includes web application server 3080, which enables IC Manager 304 to send requests and receive information through the Intemet.
- IC Manager 304 and web application server 308 enable specific information components to be ret ⁇ eved by first activating a particular cart ⁇ dge 306 and then performing some action through database 302.
- the name of a desired cart ⁇ dge 306 can be given to IC Manager 304, which then locates and activates the desired cart ⁇ dge, through the Intemet if necessary.
- cart ⁇ dge 306 Once cart ⁇ dge 306 has been activated, it performs a specific function, such as ret ⁇ eving an information component from database 302, for example The information component can then be dist ⁇ vide through web application server 308.
- IC Manager 304 can interact with web application server 308, and to give the component to web application server 308, and to give the component to web application server 308, could also be used.
- the desired o ⁇ ginal information is sent to one of a plurality of image processors 310 Each image processor 310 transforms the o ⁇ ginal information, such as a document, into a Searchable Image Foi at (SIF) file.
- SIF Searchable Image Foi at
- Each information format preferably has its own image processor 310, so that for example a first image processor 310 could manipulate text editor documents, while a second image processor 310 might handle graphics files such as TIF (Tagged Image Foimat) or GIF (Graphics Interchange Format) format files, for example.
- each image processor 310 is optionally and preferably able to transform the "onginal" information into the corresponding SIF file "on the fly”
- the SIF file can be created and recreated as needed, without the requirement of sto ⁇ ng both the SIF file and the "o ⁇ ginal" mformation
- SIF files are preferably actually image files, most preferably fully compatible with the TIF file format, which incorporate both graphic images and information data stored m a separate text file, as well as the structure which relates the graphic and textual information within the o ⁇ gmal document.
- SIF files include a header section for general information about the file such as the image resolution, the digital graphic image stored in the conventional raster format, information relating to individual words or elements of the image file, and administrative information which contains the relational structure of the image and textual elements.
- the information relating to individual words includes not only the text of the words, but also the data generated by the OCR technology regarding unidentified characters and probable errors (APPS). if the o ⁇ gmal document was an electronically scanned paper document Thus, any search of the textual information can compensate for these unidentified characters and errors.
- the actual SIF file is assembled from the basic document elements which were desc ⁇ bed in Sections 2 and 5
- the SIF file is assembled "on the fly” by image processor 310, and can then be dist ⁇ ubbed to a client through web application server 308
- the SIF file would include the text and images from an onginal document, for example.
- the client application issues a request for information by sending a polygon to IC JBC Server 300.
- This polygon would include the geomet ⁇ cal location of the desired information within a document.
- the polygon could first be obtained as the results of a search through IC Manager 304, for example. Once obtained, the polygon would enable IC JBC Server 300 to determine exactly which information to package into the object image clip. For example, the client application might only want to ret ⁇ eve a single table from a newsletter.
- the approp ⁇ ate polygon would be sent to IC JBC Server 300, which would then pass the request to the approp ⁇ ate image processor 310.
- the table would then be sent as an object image clip to the client application.
- the original document would be stored in its entirety but then retrieved as an individual component or components, if desired.
- IC JBC Server 300 preferably also includes a view server 312 and a print server 314.
- view server 312 and print server 314 may provide these services by being adapted to the main proprietary ORB models which are available, such as the "Cartridge” model of Oracle Corp. (California, USA) or the "Blade” model of InformixTM.
- print server 314 preferably allows high quality on-demand printing of the original document in a platform- independent manner. Each separate printing service is provided as a cartridge if the "Cartridge" model is used.
- View server 312 provides the appropriate image application services to IC Manager 304, such as services related to the display of an image on a computer screen through a GUI, for example. View server 312 could also provide each service as a cartridge if the "Cartridge" model is used.
- Section 3 Client as a Web Browser
- the GUI could be an HTML (hypertext mark-up language) interface, such that IC client 98 is a Web browser-type software application, it being understood that this is for the purposes of illustration only and is not meant to be limiting in any way.
- HTML rendering engine could be used as that described in Chapter EL, Section 4.
- an HTML interface 316 displays the Web page.
- .HTML interface 316 could be a Web browser, for example.
- the Web page which is displayed is customizable for a particular user by an ITT.ML customization module 318.
- Java components 320 can also be provided to client 98.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU13715/99A AU1371599A (en) | 1997-10-31 | 1998-11-02 | Information component management system |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US96171497A | 1997-10-31 | 1997-10-31 | |
US08/961,714 | 1997-10-31 | ||
US08/962,117 | 1997-10-31 | ||
US08/962,117 US6161107A (en) | 1997-10-31 | 1997-10-31 | Server for serving stored information to client web browser using text and raster images |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1999023584A2 true WO1999023584A2 (en) | 1999-05-14 |
WO1999023584A3 WO1999023584A3 (en) | 1999-09-10 |
Family
ID=27130428
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1998/023193 WO1999023584A2 (en) | 1997-10-31 | 1998-11-02 | Information component management system |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU1371599A (en) |
WO (1) | WO1999023584A2 (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1074927A2 (en) * | 1999-08-05 | 2001-02-07 | The Boeing Company | Intelligent wiring diagram system |
WO2001024053A2 (en) * | 1999-09-28 | 2001-04-05 | Xmlexpress, Inc. | System and method for automatic context creation for electronic documents |
WO2001035617A2 (en) * | 1999-10-29 | 2001-05-17 | Telera, Inc. | Distributed call center with local points of presence |
EP1122652A1 (en) * | 2000-02-03 | 2001-08-08 | Mitsubishi Denki Kabushiki Kaisha | Data Integration system |
WO2001092986A2 (en) * | 2000-05-26 | 2001-12-06 | Newsstand, Inc. | Providing a digital version of a mass-produced printed paper |
WO2002084638A1 (en) * | 2001-04-10 | 2002-10-24 | Presedia, Inc. | System, method and apparatus for converting and integrating media files |
US6795819B2 (en) * | 2000-08-04 | 2004-09-21 | Infoglide Corporation | System and method for building and maintaining a database |
US6839714B2 (en) * | 2000-08-04 | 2005-01-04 | Infoglide Corporation | System and method for comparing heterogeneous data sources |
US6845273B1 (en) | 2000-05-26 | 2005-01-18 | Newsstand, Inc. | Method and system for replacing content in a digital version of a mass-produced printed paper |
US6850260B1 (en) | 2000-05-26 | 2005-02-01 | Newsstand, Inc. | Method and system for identifying a selectable portion of a digital version of a mass-produced printed paper |
EP1569134A1 (en) * | 2004-02-24 | 2005-08-31 | Sap Ag | A computer system, a database for storing electronic data and a method to operate a database system for converting and displaying archived data |
US7181679B1 (en) | 2000-05-26 | 2007-02-20 | Newsstand, Inc. | Method and system for translating a digital version of a paper |
US7447771B1 (en) | 2000-05-26 | 2008-11-04 | Newsstand, Inc. | Method and system for forming a hyperlink reference and embedding the hyperlink reference within an electronic version of a paper |
US7953712B2 (en) | 2004-02-24 | 2011-05-31 | Sap Ag | Computer system, a database for storing electronic data and a method to operate a database system |
EP4064075A1 (en) * | 2021-03-26 | 2022-09-28 | FUJIFILM Business Innovation Corp. | Information processing apparatus, program, and information processing method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5818446A (en) * | 1996-11-18 | 1998-10-06 | International Business Machines Corporation | System for changing user interfaces based on display data content |
-
1998
- 1998-11-02 WO PCT/US1998/023193 patent/WO1999023584A2/en active Application Filing
- 1998-11-02 AU AU13715/99A patent/AU1371599A/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5818446A (en) * | 1996-11-18 | 1998-10-06 | International Business Machines Corporation | System for changing user interfaces based on display data content |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1074927A3 (en) * | 1999-08-05 | 2003-08-27 | The Boeing Company | Intelligent wiring diagram system |
EP1074927A2 (en) * | 1999-08-05 | 2001-02-07 | The Boeing Company | Intelligent wiring diagram system |
WO2001024053A2 (en) * | 1999-09-28 | 2001-04-05 | Xmlexpress, Inc. | System and method for automatic context creation for electronic documents |
WO2001024053A3 (en) * | 1999-09-28 | 2004-03-25 | Xmlexpress Inc | System and method for automatic context creation for electronic documents |
WO2001035617A2 (en) * | 1999-10-29 | 2001-05-17 | Telera, Inc. | Distributed call center with local points of presence |
WO2001035617A3 (en) * | 1999-10-29 | 2002-09-12 | Telera Inc | Distributed call center with local points of presence |
EP1122652A1 (en) * | 2000-02-03 | 2001-08-08 | Mitsubishi Denki Kabushiki Kaisha | Data Integration system |
US6810429B1 (en) | 2000-02-03 | 2004-10-26 | Mitsubishi Electric Research Laboratories, Inc. | Enterprise integration system |
US7900130B1 (en) | 2000-05-26 | 2011-03-01 | Libredigital, Inc. | Method, system and computer program product for embedding a hyperlink within a version of a paper |
US7181679B1 (en) | 2000-05-26 | 2007-02-20 | Newsstand, Inc. | Method and system for translating a digital version of a paper |
US9122661B2 (en) | 2000-05-26 | 2015-09-01 | Libredigital, Inc. | Method, system and computer program product for providing digital content |
US9087026B2 (en) | 2000-05-26 | 2015-07-21 | Libredigital, Inc. | Method, system and computer program product for providing digital content |
WO2001092986A3 (en) * | 2000-05-26 | 2002-03-28 | Newsstand Inc | Providing a digital version of a mass-produced printed paper |
US9087027B2 (en) | 2000-05-26 | 2015-07-21 | Libredigital, Inc. | Method, system and computer program product for providing digital content |
GB2381350B (en) * | 2000-05-26 | 2005-01-12 | Newsstand Inc | Method, system, and computer program product for providing a digital version of a mass-produced printed paper |
US6845273B1 (en) | 2000-05-26 | 2005-01-18 | Newsstand, Inc. | Method and system for replacing content in a digital version of a mass-produced printed paper |
US6850260B1 (en) | 2000-05-26 | 2005-02-01 | Newsstand, Inc. | Method and system for identifying a selectable portion of a digital version of a mass-produced printed paper |
US8438466B2 (en) | 2000-05-26 | 2013-05-07 | Libredigital, Inc. | Method, system and computer program product for searching an electronic version of a paper |
US8352849B2 (en) | 2000-05-26 | 2013-01-08 | Libredigital, Inc. | Method, system and computer program product for providing digital content |
GB2381350A (en) * | 2000-05-26 | 2003-04-30 | Newsstand Inc | Method system and computer program product for providing a digital version of a mass-produced printed paper |
US7447771B1 (en) | 2000-05-26 | 2008-11-04 | Newsstand, Inc. | Method and system for forming a hyperlink reference and embedding the hyperlink reference within an electronic version of a paper |
WO2001092986A2 (en) * | 2000-05-26 | 2001-12-06 | Newsstand, Inc. | Providing a digital version of a mass-produced printed paper |
US8332742B2 (en) | 2000-05-26 | 2012-12-11 | Libredigital, Inc. | Method, system and computer program product for providing digital content |
US8055994B1 (en) | 2000-05-26 | 2011-11-08 | Libredigital, Inc. | Method, system and computer program product for displaying a version of a paper |
US6839714B2 (en) * | 2000-08-04 | 2005-01-04 | Infoglide Corporation | System and method for comparing heterogeneous data sources |
US6795819B2 (en) * | 2000-08-04 | 2004-09-21 | Infoglide Corporation | System and method for building and maintaining a database |
US7039643B2 (en) * | 2001-04-10 | 2006-05-02 | Adobe Systems Incorporated | System, method and apparatus for converting and integrating media files |
WO2002084638A1 (en) * | 2001-04-10 | 2002-10-24 | Presedia, Inc. | System, method and apparatus for converting and integrating media files |
US7953712B2 (en) | 2004-02-24 | 2011-05-31 | Sap Ag | Computer system, a database for storing electronic data and a method to operate a database system |
EP1569134A1 (en) * | 2004-02-24 | 2005-08-31 | Sap Ag | A computer system, a database for storing electronic data and a method to operate a database system for converting and displaying archived data |
EP4064075A1 (en) * | 2021-03-26 | 2022-09-28 | FUJIFILM Business Innovation Corp. | Information processing apparatus, program, and information processing method |
Also Published As
Publication number | Publication date |
---|---|
WO1999023584A3 (en) | 1999-09-10 |
AU1371599A (en) | 1999-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6161107A (en) | Server for serving stored information to client web browser using text and raster images | |
US6249794B1 (en) | Providing descriptions of documents through document description files | |
AU764320B2 (en) | Information storage and retrieval system for storing and retrieving the visual form of information from an application in a database | |
US6401097B1 (en) | System and method for integrated document management and related transmission and access | |
US7168034B2 (en) | Method for promoting contextual information to display pages containing hyperlinks | |
US6832351B1 (en) | Method and system for previewing and printing customized business forms | |
US6169547B1 (en) | Method for displaying an icon of media data | |
US7177949B2 (en) | Template architecture and rendering engine for web browser access to databases | |
US6721921B1 (en) | Method and system for annotating documents using an independent annotation repository | |
US7058944B1 (en) | Event driven system and method for retrieving and displaying information | |
US20050182755A1 (en) | Systems and methods for analyzing documents over a network | |
US7406664B1 (en) | System for integrating HTML Web site views into application file dialogs | |
US7240294B2 (en) | Method of constructing a composite image | |
US7106469B2 (en) | Variable data printing with web based imaging | |
US7007231B2 (en) | Document management system employing multi-zone parsing process | |
JP2011138533A (en) | System and method for content delivery over wireless communication medium to portable computing device | |
US7213202B1 (en) | Simplified design for HTML | |
WO1999023584A2 (en) | Information component management system | |
Merz | Web publishing with Acrobat/PDF | |
KR20060101803A (en) | Creating and active viewing method for an electronic document | |
US6665090B1 (en) | System and method for creating and printing a creative expression | |
JPH09231121A (en) | Document storage device | |
EP0843266A2 (en) | Dynamic incremental updating of electronic documents | |
Hoppe | Integrated management of technical documentation: the system SPRITE | |
Kapidakis | Issues in the Development and Operation of a Digital Library |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: KR |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: CA |