US20080281842A1 - Apparatus and method for pre-processing mapping information for efficient decomposition of xml documents - Google Patents
Apparatus and method for pre-processing mapping information for efficient decomposition of xml documents Download PDFInfo
- Publication number
- US20080281842A1 US20080281842A1 US12/182,075 US18207508A US2008281842A1 US 20080281842 A1 US20080281842 A1 US 20080281842A1 US 18207508 A US18207508 A US 18207508A US 2008281842 A1 US2008281842 A1 US 2008281842A1
- Authority
- US
- United States
- Prior art keywords
- document
- data structure
- mapping
- xml
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
- G06F16/86—Mapping to a database
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
Definitions
- the present invention relates generally to databases, and more particularly to pre-processing mapping information for efficient decomposition of an XML document for storage in a database.
- Databases are computerized information storage and retrieval systems. There are many different types of databases.
- One particular type of database is a relational database that includes a relational database management system (RDBMS).
- RDBMS relational database management system
- DBMS database management system
- Relational databases are organized into physical tables which consist of rows and columns of data. The rows are formally called “tuples”.
- a database will typically have many physical tables and each physical table will typically have multiple tuples and multiple columns.
- the physical tables are typically stored on random access storage devices (DASD) such as magnetic or optical disk drives for semi-permanent storage.
- DASD random access storage devices
- An XML document can be stored in a relational database though a process of decomposition—i.e., breaking the XML document into component pieces (or portions) and storing the component pieces in the relational database.
- the specification of the component pieces and where the component pieces are to be stored in the relational database is typically accomplished through a mapping document.
- the mapping document contains information as to which XML elements/attributes are mapped to which table and column in the relational database.
- mapping document On each decomposition operation, along with the XML document to be decomposed, the mapping document is also typically supplied by the user.
- the mapping document must be parsed to extract mapping information, which must then be processed and transformed into internal data structures for use during the actual decomposition of the XML document.
- a single mapping document can be used to decompose any instance XML document that conforms to the structure that the mapping document describes. Accordingly, the same mapping document can be used to decompose many XML documents over any time period.
- Conventional techniques for decomposing an XML document may not save the results of the processing of the mapping document and, therefore, the same processing of the mapping document must be repeated each time a different XML document is decomposed.
- mapping information which reduces the amount of time required to decompose XML documents.
- present invention addresses such a need.
- this specification describes a method for pre-processing mapping information for efficient decomposition of an XML document for storage in a database.
- the method includes receiving a mapping document that describes how all of (or a portion of) the XML document is to be decomposed, transforming the mapping document into a data structure for decomposing the XML document, and making the data structure persistent for use with a subsequent decomposition operation that decomposes an XML document.
- Making the data structure persistent can include storing the data structure in the database. Storing the data structure can include assigning a unique identifier to the data structure, and using the unique identifier to later retrieve the data structure from the database on a subsequent decomposition of any XML document that conforms to the XML schema. Storing the data structure can include storing the data structure as metadata in the database.
- the mapping document can be in the form of a set of related XML schema documents (also known as a XML schema).
- the data structure can include one or more nodes that represent XML schema components including model groups, particles, or element or attribute declarations, and include one or more edges that connect the one or more nodes according to relationships defined in the XML schema.
- Transforming the mapping document can further include parsing the one or more annotations to obtain the mapping information that maps XML data to the database.
- the method can further include creating one or more second data structures for each mapped table in the database based on the mapping information.
- the method can further include assigning a unique identifier to the set of data structures (i.e., the data structure corresponding to the mapping document and the one or more second data structures as a whole) and storing the set of data structures as metadata in the database for use with a subsequent decomposition operation that decomposes any XML document that conforms to the XML schema.
- a unique identifier to the set of data structures (i.e., the data structure corresponding to the mapping document and the one or more second data structures as a whole) and storing the set of data structures as metadata in the database for use with a subsequent decomposition operation that decomposes any XML document that conforms to the XML schema.
- information relating the data structure corresponding to the mapping document with the one or more corresponding second data structures can also be stored.
- the database can be a relational database.
- this specification describes a computer program product, tangibly stored on a computer-readable medium, for pre-processing mapping information for efficient decomposition of an XML document for storage in a database.
- the product includes instructions to cause a programmable processor to receive a mapping document.
- the mapping document can be in the form of a set of related XML schema documents (also known as a XML schema).
- the mapping document describes how portions of the XML document are to be decomposed.
- the product further includes instructions to transform the mapping document into a data structure for decomposing the XML document, and make the data structure persistent for use with a subsequent decomposition operation that decomposes any XML document that conforms to the XML schema.
- this specification describes a decomposition module for pre-processing mapping information for efficient decomposition of an XML document for storage in a database.
- the decomposition module includes an engine operable to receive a mapping document that describes how portions of the XML document are to be decomposed.
- the mapping document can be in the form of a set of related XML schema documents.
- the engine is operable to transform the mapping document into a data structure that can be used for efficient decomposition of the XML document, and make the data structure persistent for use with a subsequent decomposition operation that decomposes any XML document that conforms to the XML schema.
- Implementations may provide one or more of the following advantages.
- the present specification describes techniques for decomposing an XML document that save a significant amount of CPU (processor) cycles as compared to conventional decomposition techniques. Additionally, inadvertent (human) modifications to a mapping document that is supplied to a decomposition operation are avoided, along with subsequent unexplained differences in the expected decomposition results. Since the mapping document is stored in the database, there is a record of the mapping information used to perform a decomposition operation. Such a record can be useful for diagnostics or audit trails.
- FIG. 1 is a block diagram of a data processing system including a decomposition module in accordance with one implementation of the invention.
- FIG. 2 is a block diagram illustrating the decomposition module of FIG. 1 in accordance with one implementation of the invention.
- FIG. 3 illustrates a method for registering a mapping document in accordance with one implementation of the invention.
- FIG. 4 illustrates a method for decomposing an XML document in accordance with one implementation of the invention.
- FIG. 5 is a block diagram of a data processing system suitable for storing and/or executing program code in accordance with one implementation of the invention.
- Implementations of the present invention relates generally to databases, and more particularly to pre-processing mapping information for efficient decomposition of an XML document for storage in a database.
- the following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements.
- Various modifications to implementations and the generic principles and features described herein will be readily apparent to those skilled in the art.
- the present invention is not intended to be limited to the implementations shown but is to be accorded the widest scope consistent with the principles and features described herein.
- FIG. 1 illustrates a data processing system 100 in accordance with one implementation of the invention.
- Data processing system 100 includes input and output devices 102 , a programmed computer 104 , and a database 106 .
- Input and output devices 102 can include devices such as a printer, a keyboard, a mouse, a digitizing pen, a display, a printer, and the like.
- Programmed computer 104 can be any type of computer system, including for example, a workstation, a desktop computer, a laptop computer, a personal digital assistant (PDA), a cell phone, a network, and so on.
- Database 106 can be a relational database including one or more tables (not shown) for storing data.
- DBMS database management system
- decomposition module 110 runs on programmed computer 104 is a database management system (DBMS) 108 including a decomposition module 110 .
- the database management system (DBMS) 108 and decomposition module 110 are features of DB2 available from International Business Machines, Corporation of Armonk, N.Y.
- database management system (DBMS) 108 and decomposition module 110 use relational techniques for storing and retrieving XML documents from database 106 .
- decomposition module 110 is operable to receive an XML document 112 , decompose XML document 112 into fragmented data, and store the fragmented data internally within database 106 .
- Decomposition of an XML document is the process of breaking the XML document into component pieces and storing the component pieces in, e.g., a relational database.
- the specification of the component pieces and where the component pieces are to be stored in the relational database is typically accomplished through a mapping document (e.g., mapping document 114 ).
- the mapping document is in the form of a set of related XML schema documents (also known as a XML schema) that describe the structure of conforming XML instance documents. See http://www.w3.org/TR/xmlschema-1/and http://www.w3.org/TR/xmlschema-2/for the W3 recommendations for the specification of XML schema, which are incorporated herein by reference.
- the set of related XML schema documents can be augmented with annotations that describe the mapping of XML components to tables/columns in, e.g., a relational database.
- Annotations are a feature of XML schema that provide for application-specific information to be supplied to programs processing the schema or instance XML documents.
- the mapping document minimally contains information as to which XML elements/attributes are mapped to which table and column in, for example, a relational database.
- mapping document includes: specification of the conditions which the XML element/attribute should satisfy before the XML element/attribute is stored in the relational database; data processing instructions to apply to the XML element/attribute when the XML element/attribute is stored in the relational database; and the cardinality relationship between the attribute sets of the relation.
- decomposition module 110 when decomposition module 110 performs a decomposition operation, decomposition module 110 first receives a mapping document 114 and then later receives an XML document (that is to be decomposed), both of which can be supplied by a user. Decomposition module 110 parses mapping document 114 to extract mapping information, which mapping information is then processed and transformed into internal data structures (discussed in greater detail below) for use during the actual decomposition of XML document 112 .
- decomposition module 110 is operable to perform processing on a mapping document as a separate, distinct user operation, which is referred to herein as registration of the mapping document.
- the registration operation makes the internal data structures persistent by storing the internal data structures (in one implementation) as metadata in database 106 , and returns to the user a unique identifier for the just processed mapping information.
- the user supplies the unique identifier for the mapping information along with the XML document (e.g., XML document 112 ) to decomposition module 110 .
- decomposition module 110 uses the unique identifier to read the persistent metadata from database 106 and restores the internal data structures in a memory of programmed computer 104 , after which, the actual decomposition of the XML document begins.
- this alternative saves a significant amount of CPU cycles of repeated work that is typically performed and discarded on each decomposition operation.
- the time spent processing the mapping document may dwarf the time spent on decomposing the XML documents. This technique also prevents inadvertent modifications to the mapping document that is supplied to the decomposition operation, and subsequent unexplained differences in the expected decomposition results.
- FIG. 2 illustrates one implementation of decomposition module 110 in greater detail.
- decomposition module 110 includes a parsing engine 202 , a registration engine 204 , and a decomposition engine 206 .
- a parsing engine 202 parsing engine 202
- registration engine 204 registration engine
- decomposition engine 206 decomposition engine 206
- three separate engines are shown in FIG. 2 by way of example, the functions associated with each engine can be combined and performed by any number of engines, including a single engine.
- parsing engine 202 parses a mapping document to extract mapping information.
- the mapping document is in the form of a set of related XML schema documents (also known as a XML schema) that are augmented with annotations that describe the mapping of XML components to tables/columns in, for example, a relational database.
- Annotations are a feature of XML schema that provide for application-specific information to be supplied to programs processing the schema.
- Parsing engine 202 can comprise a general purpose XML schema processor that generates a representation of the data model for the annotated XML schema documents.
- the data model is defined by the W3 recommendation http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html, which is incorporated herein by reference.
- Parsing engine 202 is further operable to parse annotations (also captured in the data model) to obtain the mapping information.
- the mapping information comprises information that maps XML data to relational tables and columns.
- registration engine 204 constructs a first internal data structure of the data model of the set of related XML schema documents (generated by parsing engine 202 ).
- the internal data structure consists of nodes (which represent schema components such as model groups, particles, element or attribute declarations) and edges which connect the nodes according to their relationships in the set of related XML schema documents.
- registration engine 204 is further operable to create one or more second internal data structures (corresponding to each mapped table) from the mapping information (obtained by parsing engine 202 ).
- the second internal data structure consists of nodes that correspond to XML schema element/attribute declarations that have annotations mapping the element/attribute declarations to columns of tables. The mapping information from annotations attached to element/attribute declarations is saved in the corresponding node for efficient processing.
- registration engine 204 makes the internal data structures (i.e., the first and second data structures) persistent, storing them as metadata in a database (e.g., database 106 ) and giving the metadata a unique identifier.
- registration engine 204 is operable to respectively serialize the first data structure and each instance of the second data structure into its on-disk format and write the serializations into a persistent store in the database that allows for efficient retrieval.
- the persistent store is a BLOB (Binary Large Object) column in a row of a system catalog table dedicated for each data structure. The catalog table can have an index for efficient retrieval of any specific row.
- decomposition engine 206 is operable to use the unique identifier to read the persistent metadata from the database, restore the stored internal data structures in memory, and decompose an XML document based on the restored internal data structures.
- FIG. 3 illustrates a method 300 for registering a mapping document (e.g., mapping document 114 ) in accordance with one implementation of the invention.
- a mapping document is received (e.g., by decomposition module 110 ), for example, in the form of a set of annotated XML schema documents (also known as a XML schema) (step 302 ).
- annotations are a feature of XML schema that provide for application-specific information to be supplied to programs processing the schema or instance documents.
- the set of annotated XML schema documents is parsed (e.g., by parsing engine 202 ) (step 304 ), to generate a representation of a data model for the set of annotated XML schema documents.
- An internal data structure of the data model of the set of annotated XML schema documents is constructed (e.g., by registration engine 204 ) (step 306 ).
- Annotations associated with the set of annotated XML schema documents are parsed (e.g., by parsing engine 202 ) to obtain mapping information, and an internal data structure corresponding to each mapped table of a relation database is created (step 308 ).
- a unique identifier for the data structures is generated (e.g., by registration engine 204 ) (step 310 ).
- the internal data structure of the data model of the set of annotated XML schema documents is serialized and stored (e.g., by registration engine 204 ) (step 312 ).
- Each internal data structure corresponding to each mapped table is also serialized and stored (e.g., by registration engine 204 ) (step 314 ).
- the information necessary to relate the internal data structure of the data model of the set of annotated XML schema documents with the internal data structures of each mapped table is also stored.
- the unique identifier is assigned to metadata represented by these internal data structures. Accordingly, the internal data structures are persisted for use in subsequent decomposition operations.
- FIG. 4 illustrates a method 400 for decomposing an XML document (e.g., using decomposition engine 206 ) in accordance with one implementation of the invention.
- An XML document containing XML data is received (e.g., by decomposition module 110 ) (step 402 ).
- a unique identifier for mapping information associated with the XML document is received (e.g., by registration engine 204 ) (step 404 ).
- Stored internal data structures are retrieved (e.g., by registration engine 204 ) based on the unique identifier (step 406 ).
- the XML document is then decomposed (e.g., by decomposition engine 206 ) based on the rules embedded in the internal data structures retrieved from the database (step 408 ).
- One or more of method steps described above can be performed by one or more programmable processors executing a computer program to perform functions by operating on input data and generating output.
- the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
- the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
- Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
- FIG. 5 illustrates a data processing system 500 suitable for storing and/or executing program code.
- Data processing system 500 includes a processor 502 coupled to memory elements 504 A-B through a system bus 506 .
- data processing system 500 may include more than one processor and each processor may be coupled directly or indirectly to one or more memory elements through a system bus.
- Memory elements 504 A-B can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code in order to reduce the number of times the code must be retrieved from bulk storage during execution.
- I/O devices 508 A-B including, but not limited to, keyboards, displays, pointing devices, etc.
- I/O devices 508 A-B may be coupled to data processing system 500 directly or indirectly through intervening I/O controllers (not shown).
- a network adapter 510 is coupled to data processing system 500 to enable data processing system 500 to become coupled to other data processing systems or remote printers or storage devices through communication link 512 .
- Communication link 512 can be a private or public network. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters.
- mapping information can be targets other than entities in a database.
- database 106 can be a database other than a relational database, such as a hierarchical database, or other type of database. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the following claims.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Pre-processing mapping information for efficient decomposition of an XML document for storage in a database. A mapping document is received that describes how all of (or a portion of) an XML document is to decomposed, the mapping document is transformed into a data structure for decomposing an XML document, and the data is made persistent for use with a subsequent decomposition operation that decomposes an XML document.
Description
- This application is a continuation application under 35 U.S.C. §120 and claims priority to U.S. patent application Ser. No. 11/351,467, filed Feb. 10, 2006, entitled, “Method and Apparatus for Pre-processing Mapping Information for Efficient Decomposition of XML Documents,” all of which is incorporated herein by reference.
- The present invention relates generally to databases, and more particularly to pre-processing mapping information for efficient decomposition of an XML document for storage in a database.
- Databases are computerized information storage and retrieval systems. There are many different types of databases. One particular type of database is a relational database that includes a relational database management system (RDBMS). A relational database management system (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into physical tables which consist of rows and columns of data. The rows are formally called “tuples”. A database will typically have many physical tables and each physical table will typically have multiple tuples and multiple columns. The physical tables are typically stored on random access storage devices (DASD) such as magnetic or optical disk drives for semi-permanent storage.
- Increasingly, applications are storing XML documents, or parts thereof, in relational databases. An XML document can be stored in a relational database though a process of decomposition—i.e., breaking the XML document into component pieces (or portions) and storing the component pieces in the relational database. The specification of the component pieces and where the component pieces are to be stored in the relational database is typically accomplished through a mapping document. The mapping document contains information as to which XML elements/attributes are mapped to which table and column in the relational database.
- On each decomposition operation, along with the XML document to be decomposed, the mapping document is also typically supplied by the user. The mapping document must be parsed to extract mapping information, which must then be processed and transformed into internal data structures for use during the actual decomposition of the XML document. A single mapping document can be used to decompose any instance XML document that conforms to the structure that the mapping document describes. Accordingly, the same mapping document can be used to decompose many XML documents over any time period. Conventional techniques for decomposing an XML document, however, may not save the results of the processing of the mapping document and, therefore, the same processing of the mapping document must be repeated each time a different XML document is decomposed.
- Accordingly, what is needed is an improved technique of processing the mapping information which reduces the amount of time required to decompose XML documents. The present invention addresses such a need.
- In general, in one aspect, this specification describes a method for pre-processing mapping information for efficient decomposition of an XML document for storage in a database. The method includes receiving a mapping document that describes how all of (or a portion of) the XML document is to be decomposed, transforming the mapping document into a data structure for decomposing the XML document, and making the data structure persistent for use with a subsequent decomposition operation that decomposes an XML document.
- Particular implementations can include one or more of the following features. Making the data structure persistent can include storing the data structure in the database. Storing the data structure can include assigning a unique identifier to the data structure, and using the unique identifier to later retrieve the data structure from the database on a subsequent decomposition of any XML document that conforms to the XML schema. Storing the data structure can include storing the data structure as metadata in the database. The mapping document can be in the form of a set of related XML schema documents (also known as a XML schema). The XML schema documents can be augmented with one or more annotations that describe a mapping of XML elements and attributes to the database. Transforming the mapping document can include parsing the XML schema to produce a representation of a data model associated with the XML schema, wherein the data structure represents the data model.
- The data structure can include one or more nodes that represent XML schema components including model groups, particles, or element or attribute declarations, and include one or more edges that connect the one or more nodes according to relationships defined in the XML schema. Transforming the mapping document can further include parsing the one or more annotations to obtain the mapping information that maps XML data to the database. The method can further include creating one or more second data structures for each mapped table in the database based on the mapping information. The method can further include assigning a unique identifier to the set of data structures (i.e., the data structure corresponding to the mapping document and the one or more second data structures as a whole) and storing the set of data structures as metadata in the database for use with a subsequent decomposition operation that decomposes any XML document that conforms to the XML schema. In addition, information relating the data structure corresponding to the mapping document with the one or more corresponding second data structures can also be stored. The database can be a relational database.
- In general, in another aspect, this specification describes a computer program product, tangibly stored on a computer-readable medium, for pre-processing mapping information for efficient decomposition of an XML document for storage in a database. The product includes instructions to cause a programmable processor to receive a mapping document. The mapping document can be in the form of a set of related XML schema documents (also known as a XML schema). The mapping document describes how portions of the XML document are to be decomposed. The product further includes instructions to transform the mapping document into a data structure for decomposing the XML document, and make the data structure persistent for use with a subsequent decomposition operation that decomposes any XML document that conforms to the XML schema.
- In general, in another aspect, this specification describes a decomposition module for pre-processing mapping information for efficient decomposition of an XML document for storage in a database. The decomposition module includes an engine operable to receive a mapping document that describes how portions of the XML document are to be decomposed. The mapping document can be in the form of a set of related XML schema documents. The engine is operable to transform the mapping document into a data structure that can be used for efficient decomposition of the XML document, and make the data structure persistent for use with a subsequent decomposition operation that decomposes any XML document that conforms to the XML schema.
- Implementations may provide one or more of the following advantages. The present specification describes techniques for decomposing an XML document that save a significant amount of CPU (processor) cycles as compared to conventional decomposition techniques. Additionally, inadvertent (human) modifications to a mapping document that is supplied to a decomposition operation are avoided, along with subsequent unexplained differences in the expected decomposition results. Since the mapping document is stored in the database, there is a record of the mapping information used to perform a decomposition operation. Such a record can be useful for diagnostics or audit trails.
- The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.
-
FIG. 1 is a block diagram of a data processing system including a decomposition module in accordance with one implementation of the invention. -
FIG. 2 is a block diagram illustrating the decomposition module ofFIG. 1 in accordance with one implementation of the invention. -
FIG. 3 illustrates a method for registering a mapping document in accordance with one implementation of the invention. -
FIG. 4 illustrates a method for decomposing an XML document in accordance with one implementation of the invention. -
FIG. 5 is a block diagram of a data processing system suitable for storing and/or executing program code in accordance with one implementation of the invention. - Like reference symbols in the various drawings indicate like elements.
- Implementations of the present invention relates generally to databases, and more particularly to pre-processing mapping information for efficient decomposition of an XML document for storage in a database. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to implementations and the generic principles and features described herein will be readily apparent to those skilled in the art. Thus, the present invention is not intended to be limited to the implementations shown but is to be accorded the widest scope consistent with the principles and features described herein.
-
FIG. 1 illustrates adata processing system 100 in accordance with one implementation of the invention.Data processing system 100 includes input andoutput devices 102, aprogrammed computer 104, and adatabase 106. Input andoutput devices 102 can include devices such as a printer, a keyboard, a mouse, a digitizing pen, a display, a printer, and the like.Programmed computer 104 can be any type of computer system, including for example, a workstation, a desktop computer, a laptop computer, a personal digital assistant (PDA), a cell phone, a network, and so on.Database 106 can be a relational database including one or more tables (not shown) for storing data. - Running on programmed
computer 104 is a database management system (DBMS) 108 including adecomposition module 110. In one implementation, the database management system (DBMS) 108 anddecomposition module 110 are features of DB2 available from International Business Machines, Corporation of Armonk, N.Y. In one implementation, database management system (DBMS) 108 anddecomposition module 110 use relational techniques for storing and retrieving XML documents fromdatabase 106. Accordingly, in one implementation,decomposition module 110 is operable to receive anXML document 112, decomposeXML document 112 into fragmented data, and store the fragmented data internally withindatabase 106. - Decomposition of an XML document is the process of breaking the XML document into component pieces and storing the component pieces in, e.g., a relational database. The specification of the component pieces and where the component pieces are to be stored in the relational database is typically accomplished through a mapping document (e.g., mapping document 114). In one implementation, the mapping document is in the form of a set of related XML schema documents (also known as a XML schema) that describe the structure of conforming XML instance documents. See http://www.w3.org/TR/xmlschema-1/and http://www.w3.org/TR/xmlschema-2/for the W3 recommendations for the specification of XML schema, which are incorporated herein by reference. The set of related XML schema documents can be augmented with annotations that describe the mapping of XML components to tables/columns in, e.g., a relational database. Annotations are a feature of XML schema that provide for application-specific information to be supplied to programs processing the schema or instance XML documents. The mapping document minimally contains information as to which XML elements/attributes are mapped to which table and column in, for example, a relational database. Additional information that a mapping document can contain includes: specification of the conditions which the XML element/attribute should satisfy before the XML element/attribute is stored in the relational database; data processing instructions to apply to the XML element/attribute when the XML element/attribute is stored in the relational database; and the cardinality relationship between the attribute sets of the relation.
- In one implementation, when
decomposition module 110 performs a decomposition operation,decomposition module 110 first receives amapping document 114 and then later receives an XML document (that is to be decomposed), both of which can be supplied by a user.Decomposition module 110 parsesmapping document 114 to extract mapping information, which mapping information is then processed and transformed into internal data structures (discussed in greater detail below) for use during the actual decomposition ofXML document 112. Unlike a conventional decomposition module which performs processing on a mapping document to create the internal data structures when each XML document is to be decomposed,decomposition module 110 is operable to perform processing on a mapping document as a separate, distinct user operation, which is referred to herein as registration of the mapping document. The registration operation makes the internal data structures persistent by storing the internal data structures (in one implementation) as metadata indatabase 106, and returns to the user a unique identifier for the just processed mapping information. - Accordingly, in one implementation, on each decomposition operation the user supplies the unique identifier for the mapping information along with the XML document (e.g., XML document 112) to
decomposition module 110. Using the unique identifier,decomposition module 110 reads the persistent metadata fromdatabase 106 and restores the internal data structures in a memory of programmedcomputer 104, after which, the actual decomposition of the XML document begins. As the same mapping document is used to decompose many XML documents over any time period, this alternative saves a significant amount of CPU cycles of repeated work that is typically performed and discarded on each decomposition operation. For smaller XML documents, the time spent processing the mapping document may dwarf the time spent on decomposing the XML documents. This technique also prevents inadvertent modifications to the mapping document that is supplied to the decomposition operation, and subsequent unexplained differences in the expected decomposition results. -
FIG. 2 illustrates one implementation ofdecomposition module 110 in greater detail. As shown inFIG. 2 ,decomposition module 110 includes aparsing engine 202, aregistration engine 204, and adecomposition engine 206. Although three separate engines are shown inFIG. 2 by way of example, the functions associated with each engine can be combined and performed by any number of engines, including a single engine. - In one implementation, parsing
engine 202 parses a mapping document to extract mapping information. In one implementation, the mapping document is in the form of a set of related XML schema documents (also known as a XML schema) that are augmented with annotations that describe the mapping of XML components to tables/columns in, for example, a relational database. Annotations are a feature of XML schema that provide for application-specific information to be supplied to programs processing the schema. Parsingengine 202 can comprise a general purpose XML schema processor that generates a representation of the data model for the annotated XML schema documents. In one implementation, the data model is defined by the W3 recommendation http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html, which is incorporated herein by reference. Parsingengine 202 is further operable to parse annotations (also captured in the data model) to obtain the mapping information. In one implementation, the mapping information comprises information that maps XML data to relational tables and columns. - In one implementation,
registration engine 204 constructs a first internal data structure of the data model of the set of related XML schema documents (generated by parsing engine 202). In one implementation, the internal data structure consists of nodes (which represent schema components such as model groups, particles, element or attribute declarations) and edges which connect the nodes according to their relationships in the set of related XML schema documents. In one implementation,registration engine 204 is further operable to create one or more second internal data structures (corresponding to each mapped table) from the mapping information (obtained by parsing engine 202). In one implementation, the second internal data structure consists of nodes that correspond to XML schema element/attribute declarations that have annotations mapping the element/attribute declarations to columns of tables. The mapping information from annotations attached to element/attribute declarations is saved in the corresponding node for efficient processing. - In one implementation,
registration engine 204 makes the internal data structures (i.e., the first and second data structures) persistent, storing them as metadata in a database (e.g., database 106) and giving the metadata a unique identifier. For example,registration engine 204 is operable to respectively serialize the first data structure and each instance of the second data structure into its on-disk format and write the serializations into a persistent store in the database that allows for efficient retrieval. In one implementation, the persistent store is a BLOB (Binary Large Object) column in a row of a system catalog table dedicated for each data structure. The catalog table can have an index for efficient retrieval of any specific row. Accordingly,decomposition engine 206 is operable to use the unique identifier to read the persistent metadata from the database, restore the stored internal data structures in memory, and decompose an XML document based on the restored internal data structures. -
FIG. 3 illustrates amethod 300 for registering a mapping document (e.g., mapping document 114) in accordance with one implementation of the invention. A mapping document is received (e.g., by decomposition module 110), for example, in the form of a set of annotated XML schema documents (also known as a XML schema) (step 302). As discussed above, annotations are a feature of XML schema that provide for application-specific information to be supplied to programs processing the schema or instance documents. The set of annotated XML schema documents is parsed (e.g., by parsing engine 202) (step 304), to generate a representation of a data model for the set of annotated XML schema documents. An internal data structure of the data model of the set of annotated XML schema documents is constructed (e.g., by registration engine 204) (step 306). Annotations associated with the set of annotated XML schema documents are parsed (e.g., by parsing engine 202) to obtain mapping information, and an internal data structure corresponding to each mapped table of a relation database is created (step 308). A unique identifier for the data structures is generated (e.g., by registration engine 204) (step 310). The internal data structure of the data model of the set of annotated XML schema documents is serialized and stored (e.g., by registration engine 204) (step 312). Each internal data structure corresponding to each mapped table is also serialized and stored (e.g., by registration engine 204) (step 314). In addition, the information necessary to relate the internal data structure of the data model of the set of annotated XML schema documents with the internal data structures of each mapped table is also stored. In one implementation, the unique identifier is assigned to metadata represented by these internal data structures. Accordingly, the internal data structures are persisted for use in subsequent decomposition operations. -
FIG. 4 illustrates amethod 400 for decomposing an XML document (e.g., using decomposition engine 206) in accordance with one implementation of the invention. An XML document containing XML data is received (e.g., by decomposition module 110) (step 402). A unique identifier for mapping information associated with the XML document is received (e.g., by registration engine 204) (step 404). Stored internal data structures are retrieved (e.g., by registration engine 204) based on the unique identifier (step 406). The XML document is then decomposed (e.g., by decomposition engine 206) based on the rules embedded in the internal data structures retrieved from the database (step 408). - One or more of method steps described above can be performed by one or more programmable processors executing a computer program to perform functions by operating on input data and generating output. Generally, the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
-
FIG. 5 illustrates adata processing system 500 suitable for storing and/or executing program code.Data processing system 500 includes aprocessor 502 coupled tomemory elements 504A-B through asystem bus 506. In other embodiments,data processing system 500 may include more than one processor and each processor may be coupled directly or indirectly to one or more memory elements through a system bus. -
Memory elements 504A-B can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code in order to reduce the number of times the code must be retrieved from bulk storage during execution. As shown, input/output or I/O devices 508A-B (including, but not limited to, keyboards, displays, pointing devices, etc.) are coupled todata processing system 500. I/O devices 508A-B may be coupled todata processing system 500 directly or indirectly through intervening I/O controllers (not shown). - In the embodiment, a
network adapter 510 is coupled todata processing system 500 to enabledata processing system 500 to become coupled to other data processing systems or remote printers or storage devices throughcommunication link 512.Communication link 512 can be a private or public network. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters. - Various implementations for decomposing an XML document for storage in a database have been described. Nevertheless, one or ordinary skill in the art will readily recognize that there that various modifications may be made to the implementations, and any variation would be within the spirit and scope of the present invention. For example, the steps of methods discussed above can be performed in a different order to achieve desirable results. In addition, one or more aspects of the invention (e.g., pre-processing of mapping information and making the results of the processing persistent) can apply to technologies other than databases. Furthermore, the target specified in the mapping information can be targets other than entities in a database. Additionally,
database 106 can be a database other than a relational database, such as a hierarchical database, or other type of database. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the following claims.
Claims (20)
1. A method for decomposing an XML document for storage in a relational database, the method comprising:
receiving a mapping document, the mapping document describing how components of the XML document are to be decomposed, the mapping document being in the form of a set of related XML schema documents augmented with one or more annotations that describe a mapping of the decomposed components to specified relational tables and columns in the relational database;
transforming the mapping document into a data structure for decomposing the XML document; and
making the data structure persistent for use with a subsequent decomposition operation that decomposes the XML document, by storing the data structure in the relational database, such that the data structure is retrieved and used with the subsequent decomposition operation and need not be transformed from the mapping document for the subsequent decomposition operation.
2. The method of claim 1 , wherein storing the data structure comprises assigning a unique identifier to the data structure, and using the unique identifier to later retrieve the data structure from the database on a subsequent decomposition of the XML document.
3. The method of claim 1 , further wherein storing the data structure comprises storing the data structure as metadata in the database.
4. The method of claim 1 , wherein:
transforming the mapping document comprises parsing the set of related XML schema documents to produce a representation of a data model associated with the set of related XML schema documents, wherein the data structure represents the data model.
5. The method of claim 4 , wherein the data structure comprises one or more nodes that represent schema components including model groups, particles, or element or attribute declarations, and comprises one or more edges that connect the one or more nodes according to relationships defined in the set of related XML schema documents.
6. The method of claim 4 , wherein transforming the mapping document further comprises parsing the one or more annotations to obtain mapping information that maps XML data to the database, and the method further includes:
creating one or more second data structures for each mapped table in the database based on the mapping information.
7. The method of claim 4 , further comprising assigning the unique identifier to the one or more second data structures and storing the one or more second data structures as metadata in the database using the unique identifier for use with a subsequent decomposition operation that decomposes the XML document.
8. A computer program product, tangibly stored on a physical computer-readable storage medium, for decomposing an XML document for storage in a relational database, the product comprising instructions executed by and causing a programmable processor to:
receive a mapping document, the mapping document describing how components of the XML document are to be decomposed, the mapping document being in the form of a set of related XML schema documents augmented with one or more annotations that describe a mapping of the decomposed components to specified relational tables and columns in the relational database;
transform the mapping document into a data structure for decomposing the XML document; and
make the data structure persistent for use with a subsequent decomposition operation that decomposes the XML document, by storing the data structure in the relational database, such that the data structure is retrieved and used with the subsequent decomposition operation and need not be transformed from the mapping document for the subsequent decomposition operation.
9. The product of claim 8 , wherein the instructions to store the data structure includes instructions to assign a unique identifier to the data structure, and use the unique identifier to later retrieve the data structure from the database on a subsequent decomposition of the XML document.
10. The product of claim 8 , wherein the instructions to store the data structure includes instructions to store the data structure as metadata in the database.
11. The product of claim 8 , wherein:
the instructions to transform the mapping document includes instructions to parse the set of related XML schema documents to produce a representation of a data model associated with the set of related XML schema documents, wherein the data structure represents the data model.
12. The product of claim 11 , wherein the data structure comprises one or more nodes that represent schema components including model groups, particles, or element or attribute declarations, and comprises one or more edges that connect the one or more nodes according to relationships defined in the set of related XML schema documents.
13. The product of claim 11 , wherein the instructions to transform the mapping document further includes instructions to parse the one or more annotations to obtain mapping information that maps XML data to the database, and the product further comprises instructions executed by and causing a programmable processor to:
create one or more second data structures for each mapped table in the database based on the mapping information.
14. The product of claim 13 , further comprising instructions executed by and causing a programmable processor to assign the unique identifier to the one or more second data structures and store the one or more second data structures as metadata in the database using the unique identifier for use with a subsequent decomposition operation that decomposes the XML document.
15. A decomposition module for decomposing an XML document for storage in a relational database, the decomposition module comprising:
an engine operable to receive a mapping document describing how portions of an XML document are to be decomposed, the mapping document describing how components of the XML document are to be decomposed, the mapping document being in the form of a set of related XML schema documents augmented with one or more annotations that describe a mapping of the decomposed components to specified relational tables and columns in the relational database;
wherein the engine is operable to transform the mapping document into a data structure for decomposing an XML document, and make the data structure persistent for use with a subsequent decomposition operation that decomposes any XML document that conforms to the set of related XML schema documents, by storing the data structure in the relational database, such that the data structure is retrieved and used with the subsequent decomposition operation and need not be transformed from the mapping document for the subsequent decomposition operation.
16. The decomposition module of claim 15 , wherein the engine is further operable to assign a unique identifier to the data structure, and use the unique identifier to later retrieve the data structure from the database on a subsequent decomposition of the XML document.
17. The decomposition module of claim 15 , wherein:
the engine transforming the mapping document includes parsing the set of related XML schema documents to produce a representation of a data model associated with the set of related XML schema documents, wherein the data structure represents the data model.
18. The decomposition module of claim 17 , wherein the data structure comprises one or more nodes that represent schema components including model groups, particles, or element or attribute declarations, and comprises one or more edges that connect the one or more nodes according to relationships defined in the set of related XML schema documents.
19. The decomposition module of claim 17 , wherein the instructions to transform the mapping document further includes instructions to parse the one or more annotations to obtain mapping information that maps XML data to the database, and the product further comprises instructions to cause a programmable processor to:
create one or more second data structures for each mapped table in the database based on the mapping information.
20. The decomposition module of claim 19 , further comprising instructions to cause a programmable processor to assign the unique identifier to the one or more second data structures and store the one or more second data structures as metadata in the database using the unique identifier for use with a subsequent decomposition operation that decomposes the XML document.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/182,075 US20080281842A1 (en) | 2006-02-10 | 2008-07-29 | Apparatus and method for pre-processing mapping information for efficient decomposition of xml documents |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/351,467 US7529758B2 (en) | 2006-02-10 | 2006-02-10 | Method for pre-processing mapping information for efficient decomposition of XML documents |
US12/182,075 US20080281842A1 (en) | 2006-02-10 | 2008-07-29 | Apparatus and method for pre-processing mapping information for efficient decomposition of xml documents |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/351,467 Continuation US7529758B2 (en) | 2006-02-10 | 2006-02-10 | Method for pre-processing mapping information for efficient decomposition of XML documents |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080281842A1 true US20080281842A1 (en) | 2008-11-13 |
Family
ID=38429602
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/351,467 Expired - Fee Related US7529758B2 (en) | 2006-02-10 | 2006-02-10 | Method for pre-processing mapping information for efficient decomposition of XML documents |
US12/182,075 Abandoned US20080281842A1 (en) | 2006-02-10 | 2008-07-29 | Apparatus and method for pre-processing mapping information for efficient decomposition of xml documents |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/351,467 Expired - Fee Related US7529758B2 (en) | 2006-02-10 | 2006-02-10 | Method for pre-processing mapping information for efficient decomposition of XML documents |
Country Status (1)
Country | Link |
---|---|
US (2) | US7529758B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130246394A1 (en) * | 2012-03-13 | 2013-09-19 | International Business Machines Corporation | Structured large object (lob) data |
US11567920B2 (en) * | 2020-09-15 | 2023-01-31 | Sap Se | Master data mapping scheme permitting querying |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7620641B2 (en) * | 2004-12-22 | 2009-11-17 | International Business Machines Corporation | System and method for context-sensitive decomposition of XML documents based on schemas with reusable element/attribute declarations |
US20060136483A1 (en) * | 2004-12-22 | 2006-06-22 | International Business Machines Corporation | System and method of decomposition of multiple items into the same table-column pair |
US7529758B2 (en) * | 2006-02-10 | 2009-05-05 | International Business Machines Corporation | Method for pre-processing mapping information for efficient decomposition of XML documents |
US9811579B1 (en) | 2012-11-21 | 2017-11-07 | Christopher A. Olson | Document relational mapping |
US11816177B2 (en) * | 2021-07-21 | 2023-11-14 | Yext, Inc. | Streaming static web page generation |
Citations (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5644776A (en) * | 1991-07-19 | 1997-07-01 | Inso Providence Corporation | Data processing system and method for random access formatting of a portion of a large hierarchical electronically published document with descriptive markup |
US5787449A (en) * | 1994-06-02 | 1998-07-28 | Infrastructures For Information Inc. | Method and system for manipulating the architecture and the content of a document separately from each other |
US20020099687A1 (en) * | 2000-09-07 | 2002-07-25 | Muralidhar Krishnaprasad | Apparatus and method for mapping relational data and metadata to XML |
US20020123993A1 (en) * | 1999-12-02 | 2002-09-05 | Chau Hoang K. | XML document processing |
US20020133497A1 (en) * | 2000-08-01 | 2002-09-19 | Draper Denise L. | Nested conditional relations (NCR) model and algebra |
US6480865B1 (en) * | 1998-10-05 | 2002-11-12 | International Business Machines Corporation | Facility for adding dynamism to an extensible markup language |
US20030018666A1 (en) * | 2001-07-17 | 2003-01-23 | International Business Machines Corporation | Interoperable retrieval and deposit using annotated schema to interface between industrial document specification languages |
US20030120665A1 (en) * | 2001-05-25 | 2003-06-26 | Joshua Fox | Run-time architecture for enterprise integration with transformation generation |
US20030126136A1 (en) * | 2001-06-22 | 2003-07-03 | Nosa Omoigui | System and method for knowledge retrieval, management, delivery and presentation |
US20030140308A1 (en) * | 2001-09-28 | 2003-07-24 | Ravi Murthy | Mechanism for mapping XML schemas to object-relational database systems |
US20030149934A1 (en) * | 2000-05-11 | 2003-08-07 | Worden Robert Peel | Computer program connecting the structure of a xml document to its underlying meaning |
US6606620B1 (en) * | 2000-07-24 | 2003-08-12 | International Business Machines Corporation | Method and system for classifying semi-structured documents |
US20030163597A1 (en) * | 2001-05-25 | 2003-08-28 | Hellman Ziv Zalman | Method and system for collaborative ontology modeling |
US20030182268A1 (en) * | 2002-03-18 | 2003-09-25 | International Business Machines Corporation | Method and system for storing and querying of markup based documents in a relational database |
US20030204481A1 (en) * | 2001-07-31 | 2003-10-30 | International Business Machines Corporation | Method and system for visually constructing XML schemas using an object-oriented model |
US6665682B1 (en) * | 1999-07-19 | 2003-12-16 | International Business Machines Corporation | Performance of table insertion by using multiple tables or multiple threads |
US20030236718A1 (en) * | 2002-06-14 | 2003-12-25 | Yang Lou Ping | Buyer, multi-supplier, multi-stage supply chain management system |
US20030237047A1 (en) * | 2002-06-18 | 2003-12-25 | Microsoft Corporation | Comparing hierarchically-structured documents |
US20040015783A1 (en) * | 2002-06-20 | 2004-01-22 | Canon Kabushiki Kaisha | Methods for interactively defining transforms and for generating queries by manipulating existing query data |
US6687873B1 (en) * | 2000-03-09 | 2004-02-03 | Electronic Data Systems Corporation | Method and system for reporting XML data from a legacy computer system |
US20040030701A1 (en) * | 2000-11-20 | 2004-02-12 | Kirstan Vandersluis | Method for componentization of electronic document processing |
US20040068694A1 (en) * | 2002-10-03 | 2004-04-08 | Kaler Christopher G. | Grouping and nesting hierarchical namespaces |
US20040143581A1 (en) * | 2003-01-15 | 2004-07-22 | Bohannon Philip L. | Cost-based storage of extensible markup language (XML) data |
US20040162833A1 (en) * | 2003-02-13 | 2004-08-19 | Microsoft Corporation | Linking elements of a document to corresponding fields, queries and/or procedures in a database |
US20040199524A1 (en) * | 2000-03-17 | 2004-10-07 | Michael Rys | Systems and methods for transforming query results into hierarchical information |
US20040205082A1 (en) * | 2003-04-14 | 2004-10-14 | International Business Machines Corporation | System and method for querying XML streams |
US6836778B2 (en) * | 2003-05-01 | 2004-12-28 | Oracle International Corporation | Techniques for changing XML content in a relational database |
US20050015383A1 (en) * | 2003-07-15 | 2005-01-20 | Microsoft Corporation | Method and system for accessing database objects in polyarchical relationships using data path expressions |
US20050027681A1 (en) * | 2001-12-20 | 2005-02-03 | Microsoft Corporation | Methods and systems for model matching |
US20050050068A1 (en) * | 2003-08-29 | 2005-03-03 | Alexander Vaschillo | Mapping architecture for arbitrary data models |
US20050091188A1 (en) * | 2003-10-24 | 2005-04-28 | Microsoft | Indexing XML datatype content system and method |
US20050149552A1 (en) * | 2003-12-23 | 2005-07-07 | Canon Kabushiki Kaisha | Method of generating data servers for heterogeneous data sources |
US20050160110A1 (en) * | 2004-01-16 | 2005-07-21 | Charlet Kyle J. | Apparatus, system, and method for defining a metadata schema to facilitate passing data between an extensible markup language document and a hierarchical database |
US20050177578A1 (en) * | 2004-02-10 | 2005-08-11 | Chen Yao-Ching S. | Efficient type annontation of XML schema-validated XML documents without schema validation |
US20050198013A1 (en) * | 2004-03-08 | 2005-09-08 | Microsoft Corporation | Structured indexes on results of function applications over data |
US20050278358A1 (en) * | 2004-06-08 | 2005-12-15 | Oracle International Corporation | Method of and system for providing positional based object to XML mapping |
US20060031757A9 (en) * | 2003-06-11 | 2006-02-09 | Vincent Winchel T Iii | System for creating and editing mark up language forms and documents |
US20060101058A1 (en) * | 2004-11-10 | 2006-05-11 | Xerox Corporation | System and method for transforming legacy documents into XML documents |
US20060136483A1 (en) * | 2004-12-22 | 2006-06-22 | International Business Machines Corporation | System and method of decomposition of multiple items into the same table-column pair |
US20060136435A1 (en) * | 2004-12-22 | 2006-06-22 | International Business Machines Corporation | System and method for context-sensitive decomposition of XML documents based on schemas with reusable element/attribute declarations |
US7072896B2 (en) * | 2000-02-16 | 2006-07-04 | Verizon Laboratories Inc. | System and method for automatic loading of an XML document defined by a document-type definition into a relational database including the generation of a relational schema therefor |
US7096422B2 (en) * | 2003-02-28 | 2006-08-22 | Microsoft Corporation | Markup language visual mapping |
US7103611B2 (en) * | 2003-05-01 | 2006-09-05 | Oracle International Corporation | Techniques for retaining hierarchical information in mapping between XML documents and relational data |
US20060206523A1 (en) * | 2005-03-14 | 2006-09-14 | Microsoft Corporation | Single-pass translation of flat-file documents into XML format including validation, ambiguity resolution, and acknowledgement generation |
US7168035B1 (en) * | 2003-06-11 | 2007-01-23 | Microsoft Corporation | Building a view on markup language data through a set of components |
US20070067343A1 (en) * | 2005-09-21 | 2007-03-22 | International Business Machines Corporation | Determining the structure of relations and content of tuples from XML schema components |
US20070198543A1 (en) * | 2006-02-10 | 2007-08-23 | International Business Machines Corporation | Method and apparatus for pre-processing mapping information for efficient decomposition of XML documents |
US7308455B2 (en) * | 2004-12-22 | 2007-12-11 | International Business Machines Corporation | System and method for decomposition of multiple items into the same table-column pair without dedicated mapping constructs |
US7437374B2 (en) * | 2004-02-10 | 2008-10-14 | International Business Machines Corporation | Efficient XML schema validation of XML fragments using annotated automaton encoding |
-
2006
- 2006-02-10 US US11/351,467 patent/US7529758B2/en not_active Expired - Fee Related
-
2008
- 2008-07-29 US US12/182,075 patent/US20080281842A1/en not_active Abandoned
Patent Citations (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5644776A (en) * | 1991-07-19 | 1997-07-01 | Inso Providence Corporation | Data processing system and method for random access formatting of a portion of a large hierarchical electronically published document with descriptive markup |
US5787449A (en) * | 1994-06-02 | 1998-07-28 | Infrastructures For Information Inc. | Method and system for manipulating the architecture and the content of a document separately from each other |
US6480865B1 (en) * | 1998-10-05 | 2002-11-12 | International Business Machines Corporation | Facility for adding dynamism to an extensible markup language |
US6665682B1 (en) * | 1999-07-19 | 2003-12-16 | International Business Machines Corporation | Performance of table insertion by using multiple tables or multiple threads |
US20020123993A1 (en) * | 1999-12-02 | 2002-09-05 | Chau Hoang K. | XML document processing |
US20020133484A1 (en) * | 1999-12-02 | 2002-09-19 | International Business Machines Corporation | Storing fragmented XML data into a relational database by decomposing XML documents with application specific mappings |
US6643633B2 (en) * | 1999-12-02 | 2003-11-04 | International Business Machines Corporation | Storing fragmented XML data into a relational database by decomposing XML documents with application specific mappings |
US6721727B2 (en) * | 1999-12-02 | 2004-04-13 | International Business Machines Corporation | XML documents stored as column data |
US7072896B2 (en) * | 2000-02-16 | 2006-07-04 | Verizon Laboratories Inc. | System and method for automatic loading of an XML document defined by a document-type definition into a relational database including the generation of a relational schema therefor |
US6687873B1 (en) * | 2000-03-09 | 2004-02-03 | Electronic Data Systems Corporation | Method and system for reporting XML data from a legacy computer system |
US20040199524A1 (en) * | 2000-03-17 | 2004-10-07 | Michael Rys | Systems and methods for transforming query results into hierarchical information |
US20030149934A1 (en) * | 2000-05-11 | 2003-08-07 | Worden Robert Peel | Computer program connecting the structure of a xml document to its underlying meaning |
US6606620B1 (en) * | 2000-07-24 | 2003-08-12 | International Business Machines Corporation | Method and system for classifying semi-structured documents |
US20020133497A1 (en) * | 2000-08-01 | 2002-09-19 | Draper Denise L. | Nested conditional relations (NCR) model and algebra |
US20020099687A1 (en) * | 2000-09-07 | 2002-07-25 | Muralidhar Krishnaprasad | Apparatus and method for mapping relational data and metadata to XML |
US20040030701A1 (en) * | 2000-11-20 | 2004-02-12 | Kirstan Vandersluis | Method for componentization of electronic document processing |
US20030163597A1 (en) * | 2001-05-25 | 2003-08-28 | Hellman Ziv Zalman | Method and system for collaborative ontology modeling |
US20030120665A1 (en) * | 2001-05-25 | 2003-06-26 | Joshua Fox | Run-time architecture for enterprise integration with transformation generation |
US20030126136A1 (en) * | 2001-06-22 | 2003-07-03 | Nosa Omoigui | System and method for knowledge retrieval, management, delivery and presentation |
US20030018666A1 (en) * | 2001-07-17 | 2003-01-23 | International Business Machines Corporation | Interoperable retrieval and deposit using annotated schema to interface between industrial document specification languages |
US20030204481A1 (en) * | 2001-07-31 | 2003-10-30 | International Business Machines Corporation | Method and system for visually constructing XML schemas using an object-oriented model |
US7096224B2 (en) * | 2001-09-28 | 2006-08-22 | Oracle International Corporation | Mechanism for mapping XML schemas to object-relational database systems |
US20030140308A1 (en) * | 2001-09-28 | 2003-07-24 | Ravi Murthy | Mechanism for mapping XML schemas to object-relational database systems |
US20050027681A1 (en) * | 2001-12-20 | 2005-02-03 | Microsoft Corporation | Methods and systems for model matching |
US20050060332A1 (en) * | 2001-12-20 | 2005-03-17 | Microsoft Corporation | Methods and systems for model matching |
US20030182268A1 (en) * | 2002-03-18 | 2003-09-25 | International Business Machines Corporation | Method and system for storing and querying of markup based documents in a relational database |
US20030236718A1 (en) * | 2002-06-14 | 2003-12-25 | Yang Lou Ping | Buyer, multi-supplier, multi-stage supply chain management system |
US20030237047A1 (en) * | 2002-06-18 | 2003-12-25 | Microsoft Corporation | Comparing hierarchically-structured documents |
US20040015783A1 (en) * | 2002-06-20 | 2004-01-22 | Canon Kabushiki Kaisha | Methods for interactively defining transforms and for generating queries by manipulating existing query data |
US20040068694A1 (en) * | 2002-10-03 | 2004-04-08 | Kaler Christopher G. | Grouping and nesting hierarchical namespaces |
US20040143581A1 (en) * | 2003-01-15 | 2004-07-22 | Bohannon Philip L. | Cost-based storage of extensible markup language (XML) data |
US20040162833A1 (en) * | 2003-02-13 | 2004-08-19 | Microsoft Corporation | Linking elements of a document to corresponding fields, queries and/or procedures in a database |
US7096422B2 (en) * | 2003-02-28 | 2006-08-22 | Microsoft Corporation | Markup language visual mapping |
US20040205082A1 (en) * | 2003-04-14 | 2004-10-14 | International Business Machines Corporation | System and method for querying XML streams |
US6836778B2 (en) * | 2003-05-01 | 2004-12-28 | Oracle International Corporation | Techniques for changing XML content in a relational database |
US7103611B2 (en) * | 2003-05-01 | 2006-09-05 | Oracle International Corporation | Techniques for retaining hierarchical information in mapping between XML documents and relational data |
US7168035B1 (en) * | 2003-06-11 | 2007-01-23 | Microsoft Corporation | Building a view on markup language data through a set of components |
US20060031757A9 (en) * | 2003-06-11 | 2006-02-09 | Vincent Winchel T Iii | System for creating and editing mark up language forms and documents |
US20050015383A1 (en) * | 2003-07-15 | 2005-01-20 | Microsoft Corporation | Method and system for accessing database objects in polyarchical relationships using data path expressions |
US20050050068A1 (en) * | 2003-08-29 | 2005-03-03 | Alexander Vaschillo | Mapping architecture for arbitrary data models |
US20050091188A1 (en) * | 2003-10-24 | 2005-04-28 | Microsoft | Indexing XML datatype content system and method |
US20050149552A1 (en) * | 2003-12-23 | 2005-07-07 | Canon Kabushiki Kaisha | Method of generating data servers for heterogeneous data sources |
US20050160110A1 (en) * | 2004-01-16 | 2005-07-21 | Charlet Kyle J. | Apparatus, system, and method for defining a metadata schema to facilitate passing data between an extensible markup language document and a hierarchical database |
US20050177578A1 (en) * | 2004-02-10 | 2005-08-11 | Chen Yao-Ching S. | Efficient type annontation of XML schema-validated XML documents without schema validation |
US7437374B2 (en) * | 2004-02-10 | 2008-10-14 | International Business Machines Corporation | Efficient XML schema validation of XML fragments using annotated automaton encoding |
US20050198013A1 (en) * | 2004-03-08 | 2005-09-08 | Microsoft Corporation | Structured indexes on results of function applications over data |
US20050278358A1 (en) * | 2004-06-08 | 2005-12-15 | Oracle International Corporation | Method of and system for providing positional based object to XML mapping |
US20060101058A1 (en) * | 2004-11-10 | 2006-05-11 | Xerox Corporation | System and method for transforming legacy documents into XML documents |
US20060136435A1 (en) * | 2004-12-22 | 2006-06-22 | International Business Machines Corporation | System and method for context-sensitive decomposition of XML documents based on schemas with reusable element/attribute declarations |
US20060136483A1 (en) * | 2004-12-22 | 2006-06-22 | International Business Machines Corporation | System and method of decomposition of multiple items into the same table-column pair |
US7308455B2 (en) * | 2004-12-22 | 2007-12-11 | International Business Machines Corporation | System and method for decomposition of multiple items into the same table-column pair without dedicated mapping constructs |
US20060206523A1 (en) * | 2005-03-14 | 2006-09-14 | Microsoft Corporation | Single-pass translation of flat-file documents into XML format including validation, ambiguity resolution, and acknowledgement generation |
US20070067343A1 (en) * | 2005-09-21 | 2007-03-22 | International Business Machines Corporation | Determining the structure of relations and content of tuples from XML schema components |
US20070198543A1 (en) * | 2006-02-10 | 2007-08-23 | International Business Machines Corporation | Method and apparatus for pre-processing mapping information for efficient decomposition of XML documents |
US7529758B2 (en) * | 2006-02-10 | 2009-05-05 | International Business Machines Corporation | Method for pre-processing mapping information for efficient decomposition of XML documents |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130246394A1 (en) * | 2012-03-13 | 2013-09-19 | International Business Machines Corporation | Structured large object (lob) data |
US8676788B2 (en) * | 2012-03-13 | 2014-03-18 | International Business Machines Corporation | Structured large object (LOB) data |
US8832081B2 (en) | 2012-03-13 | 2014-09-09 | International Business Machines Corporation | Structured large object (LOB) data |
US11567920B2 (en) * | 2020-09-15 | 2023-01-31 | Sap Se | Master data mapping scheme permitting querying |
Also Published As
Publication number | Publication date |
---|---|
US20070198543A1 (en) | 2007-08-23 |
US7529758B2 (en) | 2009-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6832219B2 (en) | Method and system for storing and querying of markup based documents in a relational database | |
CN110032604B (en) | Data storage device, translation device and database access method | |
Li | Transforming relational database into HBase: A case study | |
US5963932A (en) | Method and apparatus for transforming queries | |
US8103705B2 (en) | System and method for storing text annotations with associated type information in a structured data store | |
Lee et al. | NeT & CoT: Translating relational schemas to XML schemas using semantic constraints | |
US6449620B1 (en) | Method and apparatus for generating information pages using semi-structured data stored in a structured manner | |
US8099725B2 (en) | Method and apparatus for generating code for an extract, transform, and load (ETL) data flow | |
US6581062B1 (en) | Method and apparatus for storing semi-structured data in a structured manner | |
US8886617B2 (en) | Query-based searching using a virtual table | |
US8838636B2 (en) | Unifying hetrogenous data | |
US7634498B2 (en) | Indexing XML datatype content system and method | |
US9639542B2 (en) | Dynamic mapping of extensible datasets to relational database schemas | |
US7246114B2 (en) | System and method for presenting a query expressed in terms of an object model | |
US20050160076A1 (en) | Method and apparatus for referring to database integration, and computer product | |
US7529758B2 (en) | Method for pre-processing mapping information for efficient decomposition of XML documents | |
EP2211277A1 (en) | Method and apparatus for generating an integrated view of multiple databases | |
US20090077625A1 (en) | Associating information related to components in structured documents stored in their native format in a database | |
CN116257610B (en) | Intelligent question-answering method, device, equipment and medium based on industry knowledge graph | |
CN114661832B (en) | Multi-mode heterogeneous data storage method and system based on data quality | |
CN113704575A (en) | SQL method, device, equipment and storage medium for analyzing XML and Java files | |
US20070282804A1 (en) | Apparatus and method for extracting database information from a report | |
CN108804580A (en) | A method of the key word of the inquiry in federal type RDF data library | |
US20060235820A1 (en) | Relational query of a hierarchical database | |
Nassiri et al. | One query to retrieve XML and relational data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |