[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20160224623A1 - Workflow Processing System and Method with Database System Support - Google Patents

Workflow Processing System and Method with Database System Support Download PDF

Info

Publication number
US20160224623A1
US20160224623A1 US15/094,101 US201615094101A US2016224623A1 US 20160224623 A1 US20160224623 A1 US 20160224623A1 US 201615094101 A US201615094101 A US 201615094101A US 2016224623 A1 US2016224623 A1 US 2016224623A1
Authority
US
United States
Prior art keywords
data management
data
management system
references
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/094,101
Inventor
Mike Grasselt
Albert Maier
Bernhard Mitschang
Oliver Suhre
Charles Daniel Wolfson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US15/094,101 priority Critical patent/US20160224623A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GRASSELT, MIKE, SUHRE, OLIVER, MAIER, ALBERT, MITSCHANG, BERNHARD, WOLFSON, CHARLES DANIEL
Publication of US20160224623A1 publication Critical patent/US20160224623A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/30442
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/282Hierarchical databases, e.g. IMS, LDAP data stores or Lotus Notes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/86Mapping to a database
    • G06F17/30292
    • G06F17/30339
    • G06F17/30477
    • G06F17/30589
    • G06F17/30917
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • the invention relates to automatic workflow processing in a workflow processing computer system and a computer system.
  • workflow systems known in the art allow automatic execution of a sequence of processing activities expressed in a workflow description.
  • a typical pattern that can be found in workflows is to extract business data from a variety of different data sources and to combine that data for further processing in subsequent activities in the workflow.
  • the various data processing activities within a workflow are directed to each particular data source individually containing the data in question, such as a particular relational database system of a specific system vendor.
  • users of the workflow system design workflows in which portions of the required business data are obtained by writing a plurality of data requests, which are each particularly directed to the appropriate data source, for instance using Standard Query Language (SQL) statements.
  • SQL Standard Query Language
  • a federated database system is a type of meta-database management system, which transparently integrates multiple autonomous database systems into a single federated database.
  • these systems integrate heterogeneous data from various databases and provide a uniform front-end user interface to let data requested from the system appear as being from a single data source.
  • federated database systems with existing workflow systems has proven to require a significant amount of additional technical complexity that has to be handled by the user of a workflow system.
  • the user In order to address the federated database system from the workflow system, the user would need to understand the federated system, particularly how to integrate additional data sources into the system and how to map concepts from his workflow system to concepts of the federated system.
  • the integration logic of the workflow the user might need to refer to the federated system and find out which changes need to be applied before being able to return to the workflow system and change the workflow accordingly.
  • the invention provides methods and apparatus, including computer program products, implementing and using techniques for automatic workflow processing in a workflow processing computer system.
  • a data management activity description is received.
  • a set of set references associated with the data management activity is determined.
  • a set of data sources associated with the set of set references is determined.
  • the set of data sources is provided within a data management system. It is determined whether the data management system includes infrastructure for accessing the set of set references and for accessing the set of data sources. If the infrastructure is not included, the infrastructure is automatically created from information in a metadata repository coupled to the data management system.
  • References in the data management activity description is replaced by references to set references, and references to data sources is replaced by references to the infrastructure in the data management system.
  • the data management activity description is delivered for execution by the system.
  • FIG. 1 shows a schematic system overview in accordance with one embodiment of present invention.
  • FIG. 2 shows a flow diagram in accordance with one embodiment of present invention.
  • FIG. 3 shows a flow diagram of a detail of the method of FIG. 2 , in accordance with one embodiment of present invention.
  • FIG. 4 shows further details of the method of FIG. 2 , in accordance with one embodiment of present invention.
  • FIG. 1 shows a schematic overview of the system in accordance with one embodiment of the present invention.
  • Workflow processing system computer 1 includes a federated data management system support module 2 that is coupled to a federated data management system 3 and to a metadata management system/metadata repository 4 .
  • workflow system 1 is coupled to data sources 5 and 5 ′.
  • Metadata repository 4 is coupled to databases 5 and 5 ′.
  • Federated data management system 3 is coupled to databases 5 and 5 ′.
  • data sources 5 and 5 ′ represent any number of a variety of different data sources.
  • Workflow system 1 executes a sequence of processing activities and needs to extract business data from data sources 5 , 5 ′ for this purpose.
  • Data sources 5 and 5 ′ are of a different type and contain different sets of business data.
  • Databases 5 and 5 ′ may be from different vendors, each having a different implementation and different access functionalities.
  • Federated data management system 3 has access to data sources 5 and 5 ′ and is able to receive a single request for data and appropriately transform the request to requests to the individual data sources and distribute these requests to the data sources in order to collect the data appropriately.
  • Federated data management system support module 2 performs the automated interaction between workflow system 1 and federated data management system 3 using metadata information acquired from metadata repository 4 . The operation of these components will be further explained with regard to the process shown in the flow diagram of FIG. 2 .
  • FIG. 2 shows a flow diagram as an overview of the mode of operation in accordance with one embodiment of the present invention.
  • federated data management system support module 2 of the workflow system 1 receives a data management activity (DMA) description containing requests for business data from one or more of databases 5 , 5 ′.
  • DMA data management activity
  • the federated data management system support module 2 determines the set of set references that are associated with the input DMA description in step 110 .
  • the federated data management system support module 2 determines the set of all data sources associated by the previously determined set of set references and adds the data source which itself is associated with the currently processed data management activity to the set of data sources.
  • the federated data management system support module verifies whether the federated data management system 3 already contains the entire infrastructure necessary to access that individual data source. If the necessary infrastructure is not contained in the federal data management system 3 , the federated data management system support module checks whether the information needed to create this infrastructure can be obtained from the metadata repository 4 . If the necessary information to create the infrastructure can be derived from the metadata repository, the federated data management system support module 2 creates the infrastructure accordingly and adds the infrastructure to the federated data management system in step 200 . In the event that the information necessary for creating the infrastructure can not be derived from the metadata repository 4 , the federated data management system support module 2 discontinues operation and returns with an error code informing the user which information is needed to run this activity.
  • the federated data management system support module 2 further verifies for each member of the set of set references if the federated data management system 3 already contains the necessary infrastructure to access the data from the set each set reference is pointing to. If the necessary infrastructure is not present in the federated data management system 3 , the support module 2 checks whether the information needed to create this infrastructure can be obtained from the metadata repository 4 , and if so, creates the infrastructure in step 200 and adds it to the federated database system 3 accordingly. Again, if the information necessary to create the infrastructure cannot be derived from the metadata repository 4 , the federated data management system support module 2 discontinues operation and returns with an error message.
  • the federated data management system support module 2 modifies the DMA description by replacing references to set references and references to data sources, which originally are directed to each data source individually, by such references that are directed to the infrastructure in the federated data management system 3 .
  • the DMA description thus modified is delivered for execution by the workflow system.
  • the process detects whether a corresponding wrapper statement exists for each different type of data source, and whether further a corresponding server statement for each different individual data source is present. Accordingly, when checking whether the federated data management system contains a necessary infrastructure to access the data from the set each set reference is pointing to for each member of the set of set references, the federated data management system support module 2 detects whether a corresponding nickname is present for each different set (i.e., table).
  • a corresponding nickname is present for each different set (i.e., table).
  • appropriate creation of lacking statements in the infrastructure of the federated data management system includes creating wrapper statements for each different type of data source in step 210 , creating server statements for each different data source in step 220 , and creating nickname statements for each different table to access in step 230 where necessary.
  • FIG. 4 shows two further aspects of an embodiment in more detail:
  • the federated data management system support module 2 checks whether the federated data management system is capable of executing all of the operations defined in the input DMA description. If the federated data management system cannot execute all operations of the DMA description with the present infrastructure, it creates additional infrastructure in step 104 and 106 . For this, it first checks whether the required information for creating appropriate user mapping artefacts and appropriate function mapping and type mapping artefacts can be obtained from metadata repository 4 and, if so, creates the appropriate statements and returns to the normal process run as specified in various embodiment of the present invention. If the information necessary for creating this additional infrastructure can not be obtained from the metadata repository, operation is discontinued and an error code is generated.
  • the federated data management system support module 2 not only determines the set of data sources in step 120 but also checks if the number of identified data sources is equal to 1 in step 122 . If there is only one element in the set of data sources, it is not necessary to transform the DMA description to address the federated data management system is not necessary, since all data processing operations and data access operations of the DMA description are directed to the same individual data source ( 5 , or 5 ′ alternatively). In this case, the originally received DMA description is returned for execution in step 150 , and no further transformation processing is performed.
  • I1 loads a first set of business data from a DB2 database for z/OS:
  • This SQL statement returns a set reference (SR1) to the workflow engine (as opposed to a copy of the data set, thus saving data space and network capacity).
  • SQL statement I3 loads business data from an Excel data source:
  • the user is enabled, with the system in accordance with one embodiment of the present invention, to write a simple SQL statement to join the three set references and apply filter functions to acquire the desired result set, like the following SQL statement I4 to be executed by the workflow engine (transparently forwarding it to the federated data management system):
  • the infrastructure automatically created by the system in steps 200 to 230 and 104 and 106 is exemplified by the following create statements:
  • wrapper Net8 create server oraserver type oracle version 8 wrapper Net8 authorization “demo” password “cdidemo” options (node‘iidemo2’); create user mapping for user SERVER oraserver OPTIONS ( REMOTE_AUTHID ‘demo’, REMOTE_PASSWORD ‘cdidemo’); create nickname n2 for oraserver.Investments;
  • Metadata repository 4 includes information about the characteristics of the data sources of the present system. Metadata repository 4 further offers a search function that allows specifying at least one characteristic of a database object, such as its JNDI name, and to access the object and its further characteristics.
  • the characteristics thus detected from the metadata repository are that the system is of type “Oracle 8”, that it supports the “Net8” protocol, that it can be accessed using its name “iidemo2” when using said protocol, and that, for establishing the connection, user name “demo” and password “cdidemo” can be used.
  • These information are directly included in the respective “create wrapper” and “create server” statements of above example.
  • the metadata repository is queried to provide, for the current authorization-ID under which the user accesses the workflow system, the Oracle user-ID and password that allows said user to log in to the Oracle system.
  • statement I4 is modified by the system to direct all data accesses to the federated data management system, using the infrastructure as described above to run the following SQL statement:
  • the federated data management system support module acquires information about the data sources and the used tables in the SQL statements specified by the user by accessing a metadata management system 4 .
  • the metadata management system 4 is capable of storing additional information about the system environment and is able to use different types of specifications for data sources depending on the specific federated system and the specific workflow system. For instance, the workflow engine might use Java Database Connectivity (JDBC) connections and specifications whereas the federated data management system might use ODBC connections and specifications.
  • JDBC Java Database Connectivity
  • the metadata management system 4 holds such mapping information and further holds mapping information for different user managements.
  • each of the systems includes a particular user management, so that a user having access to a first database 5 in the system is not automatically granted access to a second database 5 ′ in the system. Therefore, many such user mappings may exist, all of which are to be reported to the metadata repository.
  • a common user directory can be implemented for all participating systems, for instance a directory service using the Lightweight Directory Access Protocol (LDAP). Combinations combining both a central and distributed user management are possible.
  • LDAP Lightweight Directory Access Protocol
  • Another embodiment of the present invention is a combination of the foregoing embodiments with the optimization of workflows in a workflow system as specified in the European Patent Application No. EP05108096.8, entitled “Optimization in Workflow Management Systems,” by International Business Machines Corporation.
  • the embodiments described therein relate to optimization of data management activities in workflows.
  • the various data request operations, such as I1, I2, and I3 included in a DMA description can be optimized to be directed towards a single data source, namely the federated data management system.
  • Each of those SQL statements is executed against the workflow engine and returns a set reference to the workflow engine appropriately.
  • nicknames are used that have been created in the federated system. Further, the resulting set references are pointing into the federated system and not to the individual data sources.
  • I1 (Load from DB2 for z/OS) is a SQL statement executed against a DB2 for z/OS data source and returns a set reference SR1 to the workflow engine:
  • SELECT r.cid, r.rating FROM ratings r 4 is a SQL statement executed against the workflow engine and returns a set reference SR4 to the workflow engine:
  • Embodiments of the invention can include one or more of the following advantages.
  • Workflow activity description means that describe specific nodes in a workflow and that express data management activities such as SQL statements, stored procedures, XQuery expressions, and so on, can be handled.
  • data management activities such as SQL statements, stored procedures, XQuery expressions, and so on
  • the data does not need to be copied from the data source to the workflow system at runtime, since set references allow the system (and thus the user) to refer to sets (database tables) rather than copy them. Since the set references can be passed between activities instead of the actual sets, performance of the workflow system can be significantly improved.
  • the data necessary for detecting if the federated data management system comprises the necessary infrastructure for accessing the set of set references and the set of data sources is collected.
  • Each data source in the set of data sources that is to profit from embodiments of the present invention is serviced by the federated data management system.
  • the user of the workflow system does not have to deal with the additional complexity of manually addressing the federated data management system.
  • the workflow system user can use the workflow management system in a straightforward manner and directly formulate data processing logic directed towards the system.
  • the federated data management system support module comprised in the workflow processing system carrying out the method automatically takes care of setting up the necessary infrastructure in the federated system that is needed to run the queries which have been formulated.
  • a state of the art metadata source can be used to retrieve information about system components and their capabilities and contents for all system components of the whole system landscape.
  • the original references programmed by the user of the workflow management system in the manner the user is used to are automatically replaced with centralized references to the federated database management system, so that the resulting modified data management activity description can be executed with the federated data management system, the federated data management system taking care of collecting the data from the various data sources.
  • Various embodiments of the present invention thus allow users to specify separate queries against component databases while writing efficient and easy-to-handle join statements, all of which, for instance, formulated in the standard query language (SQL). Further, users of the workflow system can specify elaborate queries against the workflow system in a direct manner. In both cases, the workflow system in accordance with various embodiments of the present invention takes care of accessing the data over the federated data management system.
  • SQL standard query language
  • Data management statements intrinsic to the data handling system can be used for creating the infrastructure, and the created infrastructure is stored for reuse in future runs, thus improving efficiency of the system.
  • information can be derived from the metadata repository coupled to the system as well.
  • the steps of creating a user mapping artefact as well as creating a function mapping artefact and type mapping artefacts can be utilized.
  • the system can detect if all data access operations are going against the same data source, in which case a federation is not necessary, and thus saves processing time by not utilizing the federated data management system but having the workflow management system directing the data management activity directly to the individual data source.
  • the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
  • the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, and so on.
  • the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
  • a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • I/O devices including but not limited to keyboards, displays, pointing devices, etc.
  • I/O controllers can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
  • Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Automatic workflow processing in a workflow processing computer system. A data management system support module receives a data management activity (DMA) description. A set of set references associated with the DMA is determined. Using the set of set references, a set of data sources associated with the set references is determined. The data sources is provided within a data management system. Using the set references and the data sources, it is determined whether the data management system includes infrastructure for accessing the set references and for accessing the data sources. If not, the infrastructure is automatically created from information in a metadata repository coupled to the data management system. References to set references and references to data sources in the data management activity description are replaced by references to the infrastructure. The data management activity description is delivered for execution by the data management system.

Description

    TECHNICAL FIELD
  • The invention relates to automatic workflow processing in a workflow processing computer system and a computer system.
  • BACKGROUND
  • Workflow systems known in the art allow automatic execution of a sequence of processing activities expressed in a workflow description. A typical pattern that can be found in workflows is to extract business data from a variety of different data sources and to combine that data for further processing in subsequent activities in the workflow.
  • In such systems, the various data processing activities within a workflow are directed to each particular data source individually containing the data in question, such as a particular relational database system of a specific system vendor. Thus, users of the workflow system design workflows in which portions of the required business data are obtained by writing a plurality of data requests, which are each particularly directed to the appropriate data source, for instance using Standard Query Language (SQL) statements.
  • Further, the various data portions thus acquired need to be combined to become the desired set of data to be processed in the further execution of the workflow, requiring additional technical effort for programming proprietary code for combining the data. This is an additional burden to the workflow system user, since the code is typically difficult to develop and thus a potential source of errors in data processing. Moreover, combining data from various sources oftentimes is inefficient and resource-consuming.
  • Furthermore, federated data management systems are known in the art. A federated database system is a type of meta-database management system, which transparently integrates multiple autonomous database systems into a single federated database. Thus, these systems integrate heterogeneous data from various databases and provide a uniform front-end user interface to let data requested from the system appear as being from a single data source.
  • Using federated database systems with existing workflow systems has proven to require a significant amount of additional technical complexity that has to be handled by the user of a workflow system. In order to address the federated database system from the workflow system, the user would need to understand the federated system, particularly how to integrate additional data sources into the system and how to map concepts from his workflow system to concepts of the federated system. In case of changes of the integration logic of the workflow, the user might need to refer to the federated system and find out which changes need to be applied before being able to return to the workflow system and change the workflow accordingly.
  • Thus, it would be desirable to have a workflow management system and a method of automatic workflow processing in such a system, and a computer system, data processing program, computer program product, and computer data signal therefore, each of which enables automated execution of a workflow while acquiring data from a number of different data sources in a manner transparent to the user of the workflow system while avoiding the disadvantages of the systems of the state of the art.
  • SUMMARY
  • In general, in one aspect, the invention provides methods and apparatus, including computer program products, implementing and using techniques for automatic workflow processing in a workflow processing computer system. A data management activity description is received. A set of set references associated with the data management activity is determined. A set of data sources associated with the set of set references is determined. The set of data sources is provided within a data management system. It is determined whether the data management system includes infrastructure for accessing the set of set references and for accessing the set of data sources. If the infrastructure is not included, the infrastructure is automatically created from information in a metadata repository coupled to the data management system. References in the data management activity description is replaced by references to set references, and references to data sources is replaced by references to the infrastructure in the data management system. The data management activity description is delivered for execution by the system.
  • The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features and advantages of the invention will be apparent from the description and drawings, and from the claims.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIG. 1 shows a schematic system overview in accordance with one embodiment of present invention.
  • FIG. 2 shows a flow diagram in accordance with one embodiment of present invention.
  • FIG. 3 shows a flow diagram of a detail of the method of FIG. 2, in accordance with one embodiment of present invention.
  • FIG. 4 shows further details of the method of FIG. 2, in accordance with one embodiment of present invention.
  • Like reference symbols in the various drawings indicate like elements
  • DETAILED DESCRIPTION
  • FIG. 1 shows a schematic overview of the system in accordance with one embodiment of the present invention. Workflow processing system computer 1 includes a federated data management system support module 2 that is coupled to a federated data management system 3 and to a metadata management system/metadata repository 4.
  • Further, workflow system 1 is coupled to data sources 5 and 5′. Metadata repository 4 is coupled to databases 5 and 5′. Federated data management system 3 is coupled to databases 5 and 5′. For all embodiments, data sources 5 and 5′ represent any number of a variety of different data sources.
  • Workflow system 1 executes a sequence of processing activities and needs to extract business data from data sources 5, 5′ for this purpose. Data sources 5 and 5′ are of a different type and contain different sets of business data. Databases 5 and 5′ may be from different vendors, each having a different implementation and different access functionalities. Federated data management system 3 has access to data sources 5 and 5′ and is able to receive a single request for data and appropriately transform the request to requests to the individual data sources and distribute these requests to the data sources in order to collect the data appropriately.
  • Federated data management system support module 2 performs the automated interaction between workflow system 1 and federated data management system 3 using metadata information acquired from metadata repository 4. The operation of these components will be further explained with regard to the process shown in the flow diagram of FIG. 2.
  • FIG. 2 shows a flow diagram as an overview of the mode of operation in accordance with one embodiment of the present invention. In step 100, federated data management system support module 2 of the workflow system 1 receives a data management activity (DMA) description containing requests for business data from one or more of databases 5, 5′. Typically, such a DMA description contains multiple requests for data to multiple data sources. Then, the federated data management system support module 2 determines the set of set references that are associated with the input DMA description in step 110. After this, the federated data management system support module 2 determines the set of all data sources associated by the previously determined set of set references and adds the data source which itself is associated with the currently processed data management activity to the set of data sources.
  • Then, for each member of the set of data sources, the federated data management system support module verifies whether the federated data management system 3 already contains the entire infrastructure necessary to access that individual data source. If the necessary infrastructure is not contained in the federal data management system 3, the federated data management system support module checks whether the information needed to create this infrastructure can be obtained from the metadata repository 4. If the necessary information to create the infrastructure can be derived from the metadata repository, the federated data management system support module 2 creates the infrastructure accordingly and adds the infrastructure to the federated data management system in step 200. In the event that the information necessary for creating the infrastructure can not be derived from the metadata repository 4, the federated data management system support module 2 discontinues operation and returns with an error code informing the user which information is needed to run this activity.
  • In a similar manner, in step 130 the federated data management system support module 2 further verifies for each member of the set of set references if the federated data management system 3 already contains the necessary infrastructure to access the data from the set each set reference is pointing to. If the necessary infrastructure is not present in the federated data management system 3, the support module 2 checks whether the information needed to create this infrastructure can be obtained from the metadata repository 4, and if so, creates the infrastructure in step 200 and adds it to the federated database system 3 accordingly. Again, if the information necessary to create the infrastructure cannot be derived from the metadata repository 4, the federated data management system support module 2 discontinues operation and returns with an error message.
  • Then, the federated data management system support module 2 modifies the DMA description by replacing references to set references and references to data sources, which originally are directed to each data source individually, by such references that are directed to the infrastructure in the federated data management system 3. Finally, in step 150, the DMA description thus modified is delivered for execution by the workflow system.
  • When determining whether the federal data management system has the necessary infrastructure to access the data sources comprised in the set of data sources, the process detects whether a corresponding wrapper statement exists for each different type of data source, and whether further a corresponding server statement for each different individual data source is present. Accordingly, when checking whether the federated data management system contains a necessary infrastructure to access the data from the set each set reference is pointing to for each member of the set of set references, the federated data management system support module 2 detects whether a corresponding nickname is present for each different set (i.e., table). Thus, as shown in FIG. 3, appropriate creation of lacking statements in the infrastructure of the federated data management system includes creating wrapper statements for each different type of data source in step 210, creating server statements for each different data source in step 220, and creating nickname statements for each different table to access in step 230 where necessary.
  • FIG. 4 shows two further aspects of an embodiment in more detail:
  • First, the federated data management system support module 2 checks whether the federated data management system is capable of executing all of the operations defined in the input DMA description. If the federated data management system cannot execute all operations of the DMA description with the present infrastructure, it creates additional infrastructure in step 104 and 106. For this, it first checks whether the required information for creating appropriate user mapping artefacts and appropriate function mapping and type mapping artefacts can be obtained from metadata repository 4 and, if so, creates the appropriate statements and returns to the normal process run as specified in various embodiment of the present invention. If the information necessary for creating this additional infrastructure can not be obtained from the metadata repository, operation is discontinued and an error code is generated.
  • Second, the federated data management system support module 2 not only determines the set of data sources in step 120 but also checks if the number of identified data sources is equal to 1 in step 122. If there is only one element in the set of data sources, it is not necessary to transform the DMA description to address the federated data management system is not necessary, since all data processing operations and data access operations of the DMA description are directed to the same individual data source (5, or 5′ alternatively). In this case, the originally received DMA description is returned for execution in step 150, and no further transformation processing is performed.
  • To give an example for data source access operations in DMA descriptions and statements and infrastructure produced by embodiments of the present invention, consider three activities I1, I2, and I3 specified by a user of the system.
  • I1 loads a first set of business data from a DB2 database for z/OS:
  • SELECT c.cid, sum(a.balance), age(c.birthday), c.income,
       c.profession
    FROM customers c, customer_accounts ca, accounts a
    WHERE c.cid = ca.cid and ca.aid = a.aid
    GROUP BY c.cid
  • This SQL statement returns a set reference (SR1) to the workflow engine (as opposed to a copy of the data set, thus saving data space and network capacity).
  • In a similar manner, the SQL statement I2 loads business data from an Oracle database:
  • SELECT i.cid, sum(i.shares)
    FROM investments i
    GROUP BY i.cid

    and thus returns a set reference (SR2) to the workflow engine.
  • Furthermore, SQL statement I3 loads business data from an Excel data source:
  • SELECT r.cid, r.rating
    FROM ratings r

    thus returning a set reference (SR3) to the workflow engine.
  • Instead of having to write proprietary code (for instance, in the Java™ programming language) to join these three sets and to construct a new set consisting of all tuples that fulfil specific criteria defining the result set, as in the state of the art, the user is enabled, with the system in accordance with one embodiment of the present invention, to write a simple SQL statement to join the three set references and apply filter functions to acquire the desired result set, like the following SQL statement I4 to be executed by the workflow engine (transparently forwarding it to the federated data management system):
  • SELECT s1.cid, s1.balance, s2.shares, s3.rating, s1.age,
      s1.income, s1.profession
    FROM sr1 s1, sr2 s2, sr3 s3
    WHERE s1.cid = s2.cid and s2.cid = s3.cid and s1.balance >
      10000 and s3.rating > 3
  • It is not only easier for users of the workflow system to formulate activity I4 using SQL statements instead of writing a complex and proprietary program to deliver the same result, it is also typically much more efficient to have SQL statements executed via the federated data management system from a performance point of view, since the federated system optimizes the execution of n-way join operations by optimizing for system internal factors like performance costs of different join algorithms, hardware and software speed, or network topology.
  • For this example, the infrastructure automatically created by the system in steps 200 to 230 and 104 and 106 is exemplified by the following create statements:
  • create wrapper Net8;
    create server oraserver type oracle version 8 wrapper Net8
      authorization “demo” password “cdidemo” options (node‘iidemo2’);
    create user mapping for user SERVER oraserver OPTIONS (
      REMOTE_AUTHID ‘demo’, REMOTE_PASSWORD ‘cdidemo’);
    create nickname n2 for oraserver.Investments;
  • This creates the necessary infrastructure for the data access as given under I2, that is, the Oracle® data source and the Oracle® table investments. Creation statements are generated in a similar manner for I1 and I3.
  • In the embodiment shown, data sources are represented in the workflow system as Java™ Naming and Directory Interface (JNDI) names for Java™ Database Connectivity (JDBC) data sources. The metadata repository 4 includes information about the characteristics of the data sources of the present system. Metadata repository 4 further offers a search function that allows specifying at least one characteristic of a database object, such as its JNDI name, and to access the object and its further characteristics.
  • In the above example, the characteristics thus detected from the metadata repository are that the system is of type “Oracle 8”, that it supports the “Net8” protocol, that it can be accessed using its name “iidemo2” when using said protocol, and that, for establishing the connection, user name “demo” and password “cdidemo” can be used. These information are directly included in the respective “create wrapper” and “create server” statements of above example. For generating the “create user mapping” statement, the metadata repository is queried to provide, for the current authorization-ID under which the user accesses the workflow system, the Oracle user-ID and password that allows said user to log in to the Oracle system.
  • Consequently, statement I4 is modified by the system to direct all data accesses to the federated data management system, using the infrastructure as described above to run the following SQL statement:
  • SELECT n1.cid, n1.balance, n2.shares, n3.rating, n1.age,
      n1.income, n1.profession
    FROM n1, n2, n3
    WHERE n1.cid = n2.cid and n2.cid = n3.cid and n1.balance >
      10000 and n3.rating > 3
  • In order to obtain the necessary information to build the infrastructure, as described above, the federated data management system support module acquires information about the data sources and the used tables in the SQL statements specified by the user by accessing a metadata management system 4. The metadata management system 4 is capable of storing additional information about the system environment and is able to use different types of specifications for data sources depending on the specific federated system and the specific workflow system. For instance, the workflow engine might use Java Database Connectivity (JDBC) connections and specifications whereas the federated data management system might use ODBC connections and specifications. The metadata management system 4 holds such mapping information and further holds mapping information for different user managements.
  • Typically, each of the systems includes a particular user management, so that a user having access to a first database 5 in the system is not automatically granted access to a second database 5′ in the system. Therefore, many such user mappings may exist, all of which are to be reported to the metadata repository. Alternatively, a common user directory can be implemented for all participating systems, for instance a directory service using the Lightweight Directory Access Protocol (LDAP). Combinations combining both a central and distributed user management are possible. However, the metadata repository always contains information about these configurations.
  • Another embodiment of the present invention is a combination of the foregoing embodiments with the optimization of workflows in a workflow system as specified in the European Patent Application No. EP05108096.8, entitled “Optimization in Workflow Management Systems,” by International Business Machines Corporation. The embodiments described therein relate to optimization of data management activities in workflows. When executing the optimization steps on a workflow having a DMA description for which the transformation steps of the embodiments of the present invention as described above have been performed, the various data request operations, such as I1, I2, and I3 included in a DMA description can be optimized to be directed towards a single data source, namely the federated data management system. This is achieved by adding an optimization pattern in a combined workflow optimizing/workflow processing system with federated database system support: Rewrite statement referring to data source X to statement that refers to the federated system instead. Thus, operations I1, I2, and I3 are rewritten to be directed to the federated system instead of going directly to the Oracle database, DB2 database, or Excel. The necessary infrastructure in the federated system is obtained as described before. The resulting rewritten statements are, in our example, as follows:
  • I1 (Load from DB2 for z/OS)
    SELECT c.cid, sum(a.balance), age(c.birthday), c.income,
      c.profession
    FROM nick_customers c, nick_customer_accounts ca,
      nick_accounts a
    WHERE c.cid = ca.cid and ca.aid = a.aid
    GROUP by c.cid
    I2 (Load from Oracle)
    SELECT i.cid, sum(i.shares)
    FROM nick_investments i
    GROUP BY i.cid
    I3 (Load from Excel)
    SELECT r.cid, r.rating
    FROM nick_ratings r
  • Each of those SQL statements is executed against the workflow engine and returns a set reference to the workflow engine appropriately. In the FROM clauses of the SQL statements, nicknames are used that have been created in the federated system. Further, the resulting set references are pointing into the federated system and not to the individual data sources.
  • Such an optimization and rewriting of DMA makes other patterns of optimization in the above referenced patent application “Optimization in Workflow Management Systems” applicable as well. Considering another example below, and assuming that the determination of clusters in I5 is implemented using SQL technology, it is possible to rewrite the activities I1 to I5 into a single one that will be executed significantly more effectively than the original statements I1 to I5. The following portions of DMA descriptions show original statements I1 to I5 and optimized SQL statement O12:
  • I1 (Load from DB2 for z/OS) is a SQL statement executed against a DB2 for z/OS data source and returns a set reference SR1 to the workflow engine:
  • SELECT c.cid, sum(a.balance), age(c.birthday), c.income,
      c.profession
    FROM customers c, customer_accounts ca, accounts a
    WHERE c.cid = ca.cid and ca.aid = a.aid
    GROUP BY c.cid

    I2 (Load from Oracle) is a SQL statement executed against an Oracle data source and returns a set reference SR2 to the workflow engine:
  • SELECT i.cid, sum(i.shares)
    FROM investments.i
    GROUP BY i.cid

    I3 (Load from Excel) is a SQL statement executed against an Excel data source and returns a set reference SR3 to the workflow engine:
  • SELECT r.cid, r.rating
    FROM ratings r

    4 (Join) is a SQL statement executed against the workflow engine and returns a set reference SR4 to the workflow engine:
  • SELECT s1.cid, s1.balance, s2.shares, s3.rating, s1.age,
      s1.income, s1.profession
    FROM sr1 s1, sr2 s2, sr3 s3
    WHERE s1.cid = s2.cid and s2.cid = s3.cid and s1.balance >
      10000 and s3.rating > 3

    I5 (Clustering) is a SQL statement executed against the workflow engine and returns a set S5 to the workflow engine:
  • SELECT s4.cid, s4.balance, s4.shares, s4.rating,
    idmmx.dm_applyClusModel(c.model,
      idmmx.dm_impApplData(
      rec2xml(1.0,‘COLATTVAL’,”,
      s4.age, s4.income, s4.profession)))
    FROM sr4 s4, idmmx.clustermodels c
    WHERE c.modelname=’customerCluster’;

    O12 (Optimization Group Result) is a SQL statement executed against the workflow engine, returns a set S5 to the workflow engine:
  • SELECT s4.cid, s4.balance, s4.shares, s4.rating,
      idmmx.dm_applyClusModel(c.model, idmmx.dm_impApplData(
      rec2xml(1.0,’COLATTVAL’,”, s4.age, s4.income, s4.profession)))
    FROM(SELECT s1.cid, s1.balance, s2.shares, s3.rating, s1.age,
      s1.income, s1.profession FROM
        (SELECT c.cid, sum(a.balance), age(c.birthday),
        c.income, c.profession
        FROM nick_customers c, nick_customer_accounts ca,
        nick_accounts a WHERE c.cid = ca.cid and ca.aid =
        a.aid GROUP BY c.cid) as s1,
        (SELECT i.cid, sum(i.shares) FROM nick_investments i
        GROUP BY i.cid) as s2,
        (SELECT r.cid, r.rating FROM nick_ratings r) as s3
        WHERE s1.cid = s2.cid and s2.cid = s3.cid and
        s1.balance > 10000 and s3.rating >3) as s4,
        idmmx.clustermodels c
    WHERE c.modelname=‘customerCluster’;
  • Embodiments of the invention can include one or more of the following advantages. Workflow activity description means that describe specific nodes in a workflow and that express data management activities such as SQL statements, stored procedures, XQuery expressions, and so on, can be handled. By processing set references, the data does not need to be copied from the data source to the workflow system at runtime, since set references allow the system (and thus the user) to refer to sets (database tables) rather than copy them. Since the set references can be passed between activities instead of the actual sets, performance of the workflow system can be significantly improved. Further, by determining the set references associated with the data management activity and determining the set of data sources associated with the set of set references, the data necessary for detecting if the federated data management system comprises the necessary infrastructure for accessing the set of set references and the set of data sources is collected. Each data source in the set of data sources that is to profit from embodiments of the present invention is serviced by the federated data management system. Thus, it is possible to centrally address the data sources through the federated data management system rather than individually addressing each data source with the respective request for the particular set of data.
  • Unnecessary data processing steps of creating the necessary infrastructure are avoided if the infrastructure is already present in the system, for instance in the case it has been created in a previous run.
  • The user of the workflow system does not have to deal with the additional complexity of manually addressing the federated data management system. The workflow system user can use the workflow management system in a straightforward manner and directly formulate data processing logic directed towards the system. The federated data management system support module comprised in the workflow processing system carrying out the method automatically takes care of setting up the necessary infrastructure in the federated system that is needed to run the queries which have been formulated.
  • A state of the art metadata source can be used to retrieve information about system components and their capabilities and contents for all system components of the whole system landscape.
  • The original references programmed by the user of the workflow management system in the manner the user is used to are automatically replaced with centralized references to the federated database management system, so that the resulting modified data management activity description can be executed with the federated data management system, the federated data management system taking care of collecting the data from the various data sources.
  • Various embodiments of the present invention thus allow users to specify separate queries against component databases while writing efficient and easy-to-handle join statements, all of which, for instance, formulated in the standard query language (SQL). Further, users of the workflow system can specify elaborate queries against the workflow system in a direct manner. In both cases, the workflow system in accordance with various embodiments of the present invention takes care of accessing the data over the federated data management system.
  • Data management statements intrinsic to the data handling system can be used for creating the infrastructure, and the created infrastructure is stored for reuse in future runs, thus improving efficiency of the system.
  • When creating additional infrastructure, information can be derived from the metadata repository coupled to the system as well. For implementing such an embodiment, the steps of creating a user mapping artefact as well as creating a function mapping artefact and type mapping artefacts can be utilized.
  • The system can detect if all data access operations are going against the same data source, in which case a federation is not necessary, and thus saves processing time by not utilizing the federated data management system but having the workflow management system directing the data management activity directly to the individual data source.
  • Combining embodiments of the present invention with the workflow optimization described in the above referenced patent application allows the addition of a pattern specifying that statements against one or more individual data sources are to be rewritten into statements that are directed to the federated data management system instead. Thus, requests to individual data sources are transformed to be directed to the federated data management system, and the system and method in accordance with various embodiments of the present invention further automatically generate the infrastructure necessary for executing the data operations, providing for additional synergic effects.
  • The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In an embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, and so on.
  • Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
  • A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • To avoid unnecessary repetitions, explanations given for one of the various embodiments are intended to refer to the other embodiments as well, where applicable. Reference signs in the claims shall not be construed as limiting the scope. The use of “comprising” in this application does not mean to exclude other elements or steps and the use of “a” or “an” does not exclude a plurality. A single unit or element may fulfil the functions of a plurality of means recited in the claims.

Claims (18)

1. A method for automatic workflow processing in a workflow processing computer system, comprising:
receiving, by a data management system support module in the workflow processing computer system, a data management activity description;
determining, by the data management system support module, a set of set references associated with the data management activity;
determining, by the data management system support module and using the set of set references, a set of data sources associated with the set of set references, the set of data sources being provided within a data management system;
determining automatically, by the data management system support module using the set of set references and the set of data sources, whether the data management system includes infrastructure for accessing the set of set references and for accessing the set of data sources;
in response to determining that the infrastructure is not included, automatically creating, by the data management system support module, the infrastructure from information in a metadata repository coupled to the data management system;
replacing in the data management activity description, by the data management system support module, references to set references and references to data sources by references to the infrastructure in the data management system; and
delivering, by the data management system support module, the data management activity description for execution by the data management system.
2. The method of claim 1, wherein the infrastructure is created by at least one of the steps of:
creating a wrapper artefact for each different type of data source in the set of data sources;
creating a server artefact for each different data source in the set of data sources;
creating a nickname artefact for each different table to be accessed; and
wherein the created infrastructure is added to the system.
3. The method of claim 1, further comprising:
determining whether the data management system supports the operations defined in the data management activity description; and
in response to determining that the operations are not supported, creating additional infrastructure to support the operations.
4. The method of claim 3, wherein creating additional infrastructure comprises:
creating a user mapping artefact; and
creating function mapping artefacts and type mapping artefacts.
5. The method of claim 1, further comprising:
determining the number of data sources in the set of data sources; and
in response to determining that there is a single data source, delivering the original data management activity description unmodified for execution.
6. The method of claim 1, wherein a group of activities is included in a workflow, the group of activities including at least one data management activity, is optimized by the following steps performed by the workflow management system:
determining the at least one data management activity;
determining at least one data level statement for each of the at least one data management activities;
determining the group of activities;
determining a process graph model from the group of activities, wherein the process graph model includes each of the at least one data level statements, and wherein the semantics of the process graph model is identical to the semantics of the group of activities;
determining an optimized process graph model from the process graph model;
determining an optimized group of activities from the optimized process graph model, whereby the semantics of the optimized group of activities is identical to the semantics of the optimized process group model;
replacing, in the workflow, the group of activities by the optimized group of activities,
wherein the process graph model includes a pattern, the optimized pattern is determined from the process graph model by optimizing the pattern, and wherein further the pattern refers to directing a database statement to a particular data source, and the pattern is optimized by transforming the pattern to a corresponding statement being directed to the data management system.
7. A workflow processing computer system, coupled to a data management system, at least one data store, and a metadata management system, the workflow processing computer system comprising:
a data management system support module configured to:
receive a data management activity description;
determine a set of set references associated with the data management activity;
determine, using the set of set references, a set of data sources associated with the set of set references, the set of data sources being provided within a data management system;
determine automatically, using the set of set references and the set of data sources, whether the data management system includes infrastructure for accessing the set of set references and for accessing the set of data sources;
in response to determining that the infrastructure is not included, automatically create the infrastructure from information in a metadata repository coupled to the data management system;
replace in the data management activity description references to set references and references to data sources by references to the infrastructure in the data management system; and
deliver the data management activity description for execution by the data management system.
8. The workflow processing computer system of claim 7, wherein the infrastructure is created by at least one of the steps of:
creating a wrapper artefact for each different type of data source in the set of data sources;
creating a server artefact for each different data source in the set of data sources;
creating a nickname artefact for each different table to be accessed; and
wherein the created infrastructure is added to the system.
9. The workflow processing computer system of claim 7, wherein the data management system support module further is configured to:
determine whether the data management system supports the operations defined in the data management activity description; and
in response to determining that the operations are not supported, create additional infrastructure to support the operations.
10. The workflow processing computer system of claim 9, wherein creating additional infrastructure comprises:
creating a user mapping artefact; and
creating function mapping artefacts and type mapping artefacts.
11. The workflow processing computer system of claim 7, wherein the data management system support module further is configured to:
determine the number of data sources in the set of data sources; and
in response to determining that there is a single data source, deliver the original data management activity description unmodified for execution.
12. The workflow processing computer system of claim 7, wherein a group of activities is included in a workflow, the group of activities including at least one data management activity, is optimized by the following steps performed by the workflow processing computer system:
determining the at least one data management activity;
determining at least one data level statement for each of the at least one data management activities;
determining the group of activities;
determining a process graph model from the group of activities, wherein the process graph model includes each of the at least one data level statements, and wherein the semantics of the process graph model is identical to the semantics of the group of activities;
determining an optimized process graph model from the process graph model;
determining an optimized group of activities from the optimized process graph model, whereby the semantics of the optimized group of activities is identical to the semantics of the optimized process group model;
replacing, in the workflow, the group of activities by the optimized group of activities,
wherein the process graph model includes a pattern, the optimized pattern is determined from the process graph model by optimizing the pattern, and wherein further the pattern refers to directing a database statement to a particular data source, and the pattern is optimized by transforming the pattern to a corresponding statement being directed to the data management system.
13. A computer program product for automatic workflow processing in a workflow processing computer system, the computer program product comprising a non-transitory tangible computer useable storage medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
receive, by a data management system support module in the workflow processing computer system, a data management activity description;
determine, by the data management system support module, a set of set references associated with the data management activity;
determine, by the data management system support module and using the set of set references, a set of data sources associated with the set of set references, the set of data sources being provided within a data management system;
determine automatically, by the data management system support module using the set of set references and the set of data sources, whether the data management system includes infrastructure for accessing the set of set references and for accessing the set of data sources;
in response to determining that the infrastructure is not included, automatically create, by the data management system support module, the infrastructure from information in a metadata repository coupled to the data management system;
replace in the data management activity description, by the data management system support module, references to set references and references to data sources by references to the infrastructure in the data management system; and
deliver, by the data management system support module, the data management activity description for execution by the data management system.
14. The computer program product of claim 13, wherein the infrastructure is created by at least one of the steps of:
creating a wrapper artefact for each different type of data source in the set of data sources;
creating a server artefact for each different data source in the set of data sources;
creating a nickname artefact for each different table to be accessed; and
wherein the created infrastructure is added to the system.
15. The computer program product of claim 13, wherein the computer readable program when executed on a computer further causes the computer to:
determine whether the data management system supports the operations defined in the data management activity description; and
in response to determining that the operations are not supported, create additional infrastructure to support the operations.
16. The computer program product of claim 15, wherein creating additional infrastructure comprises:
creating a user mapping artefact; and
creating function mapping artefacts and type mapping artefacts.
17. The computer program product of claim 13, wherein the computer readable program when executed on a computer further causes the computer to:
determine the number of data sources in the set of data sources; and
in response to determining that there is a single data source, deliver the original data management activity description unmodified for execution.
18. The computer program product of claim 13, wherein a group of activities is included in a workflow, the group of activities including at least one data management activity, is optimized by the following steps performed by the workflow management system:
determining the at least one data management activity;
determining at least one data level statement for each of the at least one data management activities;
determining the group of activities;
determining a process graph model from the group of activities, wherein the process graph model includes each of the at least one data level statements, and wherein the semantics of the process graph model is identical to the semantics of the group of activities;
determining an optimized process graph model from the process graph model;
determining an optimized group of activities from the optimized process graph model, whereby the semantics of the optimized group of activities is identical to the semantics of the optimized process group model;
replacing, in the workflow, the group of activities by the optimized group of activities,
wherein the process graph model includes a pattern, the optimized pattern is determined from the process graph model by optimizing the pattern, and wherein further the pattern refers to directing a database statement to a particular data source, and the pattern is optimized by transforming the pattern to a corresponding statement being directed to the data management system.
US15/094,101 2006-12-04 2016-04-08 Workflow Processing System and Method with Database System Support Abandoned US20160224623A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/094,101 US20160224623A1 (en) 2006-12-04 2016-04-08 Workflow Processing System and Method with Database System Support

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
EP061252938 2006-12-04
EP06125293 2006-12-04
US11/849,742 US8250583B2 (en) 2006-12-04 2007-09-04 Workflow processing system and method with federated database system support
US13/472,308 US9342572B2 (en) 2006-12-04 2012-05-15 Workflow processing system and method with database system support
US15/094,101 US20160224623A1 (en) 2006-12-04 2016-04-08 Workflow Processing System and Method with Database System Support

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/472,308 Continuation US9342572B2 (en) 2006-12-04 2012-05-15 Workflow processing system and method with database system support

Publications (1)

Publication Number Publication Date
US20160224623A1 true US20160224623A1 (en) 2016-08-04

Family

ID=38889523

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/849,742 Expired - Fee Related US8250583B2 (en) 2006-12-04 2007-09-04 Workflow processing system and method with federated database system support
US13/472,308 Expired - Fee Related US9342572B2 (en) 2006-12-04 2012-05-15 Workflow processing system and method with database system support
US15/094,101 Abandoned US20160224623A1 (en) 2006-12-04 2016-04-08 Workflow Processing System and Method with Database System Support

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US11/849,742 Expired - Fee Related US8250583B2 (en) 2006-12-04 2007-09-04 Workflow processing system and method with federated database system support
US13/472,308 Expired - Fee Related US9342572B2 (en) 2006-12-04 2012-05-15 Workflow processing system and method with database system support

Country Status (3)

Country Link
US (3) US8250583B2 (en)
EP (1) EP2126812A1 (en)
WO (1) WO2008068114A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7499906B2 (en) * 2005-09-05 2009-03-03 International Business Machines Corporation Method and apparatus for optimization in workflow management systems
US8250583B2 (en) 2006-12-04 2012-08-21 International Business Machines Corporation Workflow processing system and method with federated database system support
JP5146020B2 (en) * 2008-03-10 2013-02-20 富士通株式会社 Information processing apparatus, resource identification program, and resource identification method
BRPI0906540A2 (en) 2008-04-04 2015-09-22 Landmark Graphics Corp devices and methods for correlating metadata model representations and logical model representations of assets
US10552391B2 (en) 2008-04-04 2020-02-04 Landmark Graphics Corporation Systems and methods for real time data management in a collaborative environment
CN102053975A (en) * 2009-10-30 2011-05-11 国际商业机器公司 Database system and cross-database query optimization method
US8543932B2 (en) * 2010-04-23 2013-09-24 Datacert, Inc. Generation and testing of graphical user interface for matter management workflow with collaboration
US9818078B1 (en) * 2013-03-12 2017-11-14 Amazon Technologies, Inc. Converting a non-workflow program to a workflow program using workflow inferencing
US10417244B2 (en) 2014-09-22 2019-09-17 Red Hat, Inc. Just-in-time computation in a federated system
US10339151B2 (en) * 2015-02-23 2019-07-02 Red Hat, Inc. Creating federated data source connectors
US10432716B2 (en) 2016-02-29 2019-10-01 Bank Of America Corporation Metadata synchronization system
CN109799976B (en) * 2019-01-11 2022-04-01 上海凯岸信息科技有限公司 Real-time wind control variable calculation method based on distributed stream type calculation engine
US11561976B1 (en) * 2021-09-22 2023-01-24 Sap Se System and method for facilitating metadata identification and import

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009428A (en) * 1997-09-15 1999-12-28 International Business Machines Corporation System and method for providing a single application program interface for heterogeneous databases
US20060224564A1 (en) * 2005-03-31 2006-10-05 Oracle International Corporation Materialized view tuning and usability enhancement
US20080312959A1 (en) * 2005-08-19 2008-12-18 Koninklijke Philips Electronics, N.V. Health Care Data Management System
US8078588B2 (en) * 2005-10-10 2011-12-13 Oracle International Corporation Recoverable execution

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19535084A1 (en) * 1995-09-21 1997-03-27 Ibm Dynamic optimisation of business processes managed by computer system
JP2003528358A (en) * 1998-08-24 2003-09-24 富士通株式会社 Workflow system and method
US6606740B1 (en) * 1998-10-05 2003-08-12 American Management Systems, Inc. Development framework for case and workflow systems
DE19948028A1 (en) * 1998-11-20 2000-05-31 Ibm Request dispatch optimization method for computerized workflow management system has overall optimization function effected by local work management system for reconfiguration of remote work management systems
AU5904700A (en) * 1999-07-01 2001-01-22 Microsoft Corporation Workflow as data-transition driven, scriptable state machines
WO2002019226A1 (en) * 2000-09-01 2002-03-07 Togethersoft Corporation Methods and systems for optimizing resource allocation based on data mined from plans created from a workflow
US20020032692A1 (en) * 2000-09-08 2002-03-14 Atsuhito Suzuki Workflow management method and workflow management system of controlling workflow process
US7027997B1 (en) * 2000-11-02 2006-04-11 Verizon Laboratories Inc. Flexible web-based interface for workflow management systems
US7054862B2 (en) * 2001-06-29 2006-05-30 International Business Machines Corporation Method and system for long-term update and edit control in a database system
US6920456B2 (en) * 2001-07-30 2005-07-19 International Business Machines Corporation Method, system, and program for maintaining information in database tables and performing operations on data in the database tables
US7337124B2 (en) * 2001-08-29 2008-02-26 International Business Machines Corporation Method and system for a quality software management process
US20030074342A1 (en) * 2001-10-11 2003-04-17 Curtis Donald S. Customer information management infrastructure and methods
US20040003353A1 (en) * 2002-05-14 2004-01-01 Joey Rivera Workflow integration system for automatic real time data management
US7653562B2 (en) * 2002-07-31 2010-01-26 Sap Aktiengesellschaft Workflow management architecture
US7350188B2 (en) * 2002-07-31 2008-03-25 Sap Aktiengesellschaft Aggregation of private and shared workflows
TW200419413A (en) * 2003-01-13 2004-10-01 I2 Technologies Inc Master data management system for centrally managing core reference data associated with an enterprise
US8332864B2 (en) 2003-06-12 2012-12-11 Reuters America Inc. Business process automation
EP1636743A1 (en) 2003-06-26 2006-03-22 International Business Machines Corporation Method and system for automatically transforming a provider offering into a customer specific service environment definiton executable by resource management systems
FI118102B (en) * 2003-07-04 2007-06-29 Medicel Oy Information control system for controlling the workflow
US7386577B2 (en) 2004-02-04 2008-06-10 International Business Machines Corporation Dynamic determination of transaction boundaries in workflow systems
US20050209841A1 (en) * 2004-03-22 2005-09-22 Andreas Arning Optimization of process properties for workflows with failing activities
US7496887B2 (en) 2005-03-01 2009-02-24 International Business Machines Corporation Integration of data management operations into a workflow system
US7499906B2 (en) * 2005-09-05 2009-03-03 International Business Machines Corporation Method and apparatus for optimization in workflow management systems
US8250583B2 (en) 2006-12-04 2012-08-21 International Business Machines Corporation Workflow processing system and method with federated database system support

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009428A (en) * 1997-09-15 1999-12-28 International Business Machines Corporation System and method for providing a single application program interface for heterogeneous databases
US20060224564A1 (en) * 2005-03-31 2006-10-05 Oracle International Corporation Materialized view tuning and usability enhancement
US20080312959A1 (en) * 2005-08-19 2008-12-18 Koninklijke Philips Electronics, N.V. Health Care Data Management System
US8078588B2 (en) * 2005-10-10 2011-12-13 Oracle International Corporation Recoverable execution

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Haas et al; IBM Federated Database Technology; March 1 2002 *
Lurie; The Federation-Database Interoperability (Part 1) April 24 2003 *
Oracle Workflow Developer's Guide, Release 2.6.3 September 2203 *

Also Published As

Publication number Publication date
WO2008068114A1 (en) 2008-06-12
US9342572B2 (en) 2016-05-17
EP2126812A1 (en) 2009-12-02
US20080134198A1 (en) 2008-06-05
US20120227055A1 (en) 2012-09-06
US8250583B2 (en) 2012-08-21

Similar Documents

Publication Publication Date Title
US9342572B2 (en) Workflow processing system and method with database system support
US11816126B2 (en) Large scale unstructured database systems
US11354314B2 (en) Method for connecting a relational data store's meta data with hadoop
CN1705945B (en) Method and system for providing query attributes
US9002905B2 (en) Rapidly deploying virtual database applications using data model analysis
US8782075B2 (en) Query handling in databases with replicated data
JP5171932B2 (en) Systems and methods for integrating, managing, and coordinating customer activities
US20180165316A1 (en) Managing data with flexible schema
US8543596B1 (en) Assigning blocks of a file of a distributed file system to processing units of a parallel database management system
US10970300B2 (en) Supporting multi-tenancy in a federated data management system
US20070192374A1 (en) Virtual repository management to provide functionality
US20070214104A1 (en) Method and system for locking execution plan during database migration
US20100228764A1 (en) Offline Validation of Data in a Database System for Foreign Key Constraints
US20110246550A1 (en) System and method for aggregation of data from a plurality of data sources
US20130117290A1 (en) Platform for software as a service and method for provisioning service for supporting multi-tenants using the platform
CN107291471B (en) Meta-model framework system supporting customizable data acquisition
CN105164673A (en) Query integration across databases and file systems
US7325003B2 (en) Method and system for mapping datasources in a metadata model
US7680787B2 (en) Database query generation method and system
US9069816B2 (en) Distributed multi-step abstract queries
US20130262513A1 (en) Generic application persistence database
US20180268363A1 (en) Single Job Backorder Processing Using Prioritized Requirements
US11893019B2 (en) System and method for providing cross-microservice query optimization
CN111966692A (en) Data processing method, medium, device and computing equipment for data warehouse
Büchi et al. Relational Data Access on Big Data

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GRASSELT, MIKE;MAIER, ALBERT;MITSCHANG, BERNHARD;AND OTHERS;SIGNING DATES FROM 20070823 TO 20070903;REEL/FRAME:038228/0232

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION