CA2980333A1

CA2980333A1 - Field specialization systems and methods for improving program performance

Info

Publication number: CA2980333A1
Application number: CA2980333A
Authority: CA
Inventors: Richard T. Snodgrass; Saumya K. Debray; Rui Zhang; Stephen Thomas; Sean Mason
Original assignee: Dataware Ventures LLC; University of Arizona
Current assignee: Dataware Ventures LLC; University of Arizona
Priority date: 2015-04-02
Filing date: 2016-03-31
Publication date: 2016-10-06
Also published as: CN107851003A; EP3278218A1; WO2016161130A1; JP2018510445A; EP3278218A4

Abstract

Systems and methods for improving the performance of a computer program, such as a database management system (DBMS), are provided. Such methods involve identifying invariant intervals for variables in the DBMS code, based on a Program Representation (PR). Program interactions within the DBMS and domain assertions are deduced, based on the PR and an Ecosystem Specification for the DBMS. One or more candidate snippets are identified, based on the invariant intervals for variables in the DBMS code, the PR, one or more execution summaries associated with the DBMS, the deduced program interactions and the deduced domain assertions. Spiffs are then generated, based on the one or more candidate snippets. Such spiffs include predicate query spiff, hash-join query spiff, aggregate spiff, page spiff, and string matching spiff. The DBMS code is modified based on the speccode generated from these spiffs.

Description

2 PROGRAM PERFORMANCE

3 The present disclosure is generally related to field specialization for improving

4 performance of a computer program, and more particularly is related to systems and methods for improving the performance of database management systems by identifying 6 invariant intervals for variables and modifying the DBMS code utilizing specialized code 7 generated, at least in part, based on the identified invariant intervals.
8 A database management system (DBMS) is a collection of software programs that 9 manage the storage and access of data. As larger volumes of data are being generated nowadays and thus must be stored and efficiently accessed, DBMSes have been adopted 11 across a wide range of application domains. Driven by such ubiquitous deployments over 12 the last four decades, DBMSes have been designed and engineered based on a few data 13 models that are generally applicable to those domains. The relational data model is the one 14 most prevalently adopted by commercial and open-source DBMSes. A
significant amount of effort has been devoted to efficiently support this data model.
16 Due to the generality of the relational data model, relational database management 17 systems are themselves general, in that they can handle whatever schema the user specifies 18 and whatever query or modification is presented to them. Relational operators work on 19 essentially any relation and must contend with predicates specified on any attribute of the underlying relations. Through such innovations as effective indexing structures, innovative 21 concurrency control mechanisms, and sophisticated query optimization strategies, the 22 relational DBMSes available today are very efficient. Such generality and efficiency has 23 enabled their proliferation and use in many domains.
24 Nevertheless, such generality is realized via multiple layers of indirections and sophisticated code logic. Efficiency can be further enhanced for DBMSes by exploiting 26 invariant values present during the execution of such systems. The field specialization 27 technology disclosed in the present application is developed to automatically identify 28 invariants and effect code specialization based upon invariants.
29 Embodiments of the present disclosure provide systems and methods for improving the performance of a database management system (DBMS). Briefly described, one 31 embodiment of the method, among others, can be implemented as follows. A
computer-1 implemented method for improving the performance of a DBMS includes the steps of: (i) 2 identifying, based on a compile-time analysis of the DBMS source code, invariant intervals 3 for variables in the DBMS code; (ii) deducing, based on the source code and an Ecosystem 4 Specification for the DBMS, program interactions within the DBMS;
deducing, based on the source code, the identified invariant intervals for variables in the DBMS code and the 6 deduced program interactions, termed domain assertions; (iii) identifying, based on the 7 invariant intervals for variables in the DBMS code, the source code, and one or more 8 execution summaries associated with the DBMS executed using various workloads, the 9 deduced program interactions, and the deduced domain assertions, one or more candidate snippets; (iv) generating specialized DBMS code, at various time points including compile 11 time and runtime based on the identified candidate snippets; and (v) modifying the DBMS
12 by inserting code that may be utilized to perform the specialized code generation possibly at 13 runtime and later invoke the specialized code.
14 Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and 16 detailed description. It is intended that all such additional systems, methods, features, and 17 advantages be included within this description, be within the scope of the present disclosure, 18 and be protected by the accompanying claims.
19 Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis 21 instead being placed upon clearly illustrating the principles of the present disclosure.
22 Moreover, in the drawings, like reference numerals designate corresponding parts 23 throughout the several views.
24 FIG. 1 is a block diagram illustrating the spiff tool architecture, in accordance with an exemplary embodiment provided by this disclosure.
26 FIG. 2 is a block illustration of a field specialization process in accordance with an 27 exemplary embodiment provided by this disclosure.
28 FIG. 3 is an illustration of field specialization for elaboration a paradigm of 29 computer science with an exemplary embodiment provided by this disclosure.

1 Many embodiments of the disclosure may take the form of computer-executable 2 instructions, including algorithms executed by a programmable computer.
However, the 3 disclosure can be practiced with other computer system configurations as well. Certain 4 aspects of the disclosure can be embodied in a special-purpose computer or data processor that is specifically programmed, configured or constructed to perform one or more of the 6 computer-executable algorithms described below.
7 The disclosure also can be practiced in distributed computing environments, where 8 tasks or modules are performed by remote processing devices that are linked through a 9 communications network. Moreover, the disclosure can be practiced in Internet-based or cloud computing environments, where shared resources, software and information may be 11 provided to computers and other devices on demand. In a distributed computing 12 environment, program modules or subroutines may be located in both local and remote 13 memory storage devices. Aspects of the disclosure described below may be stored or 14 distributed on computer-readable media, including magnetic and optically readable and removable computer disks, fixed magnetic disks, floppy disk drive, optical disk drive, 16 magneto-optical disk drive, magnetic tape, hard-disk drive (HDD), solid state drive (SSD), 17 compact flash or non-volatile memory, as well as distributed electronically over networks 18 including the cloud. Data structures and transmissions of data particular to aspects of the 19 disclosure are also encompassed within the scope of the disclosure.
While the present invention may be described herein primarily with respect to a 21 relational DBMS, the present invention is not limited to such a DBMS
type. It will be 22 readily understood that the present invention may be applied to any DBMS
type, including, 23 but not limited to, hierarchical, network and object-oriented DBMS
types. Moreover, while 24 field specialization is disclosed herein primarily with respect to a DBMS, it should be understood that the concepts provided herein may be applied to any program that 26 manipulates data and in particular, performs complex analysis on that data. Specifically, it is 27 understood that the system and method disclosed may also be applicable to computer 28 programs that require high run-time performance and execute the application multiple times 29 over the same data, but with different parameters or queries. A "spiff,"
which stands for specializer in the field, is code that dynamically creates specialized code at DBMS runtime.
31 "Field specialization" is the process of inserting spiffs into DBMS code so that the DBMS

1 can specialize itself by exploiting runtime invariants. The specialized code (which may be 2 referred to herein as "speccode") is faster and generally smaller than the original 3 unspecialized code. Field specialization gets its name from the fact that the speccode is 4 generated and invoked "in the field," i.e., after the DBMS has been deployed and is running at the end user's site. A spiff uses the actual value of a runtime invariant¨which is obtained 6 at runtime _______________________________________________________ to dynamically produce code that is specialized to that particular value of the 7 runtime invariant.
8 In Applicants' co-pending U.S. Patent Application serial number 14/368,265, which 9 the present application claims priority to, the term "micro-specialization" is equivalent to the term "field specialization" as used herein; the term "bee" is equivalent to the term "spiff' as 11 used herein; an instantiated bee is equivalent to "specialized code" as used herein, which is 12 the result of a spiff; and the HRE (hive runtime environment) is equivalent to "SRE" (spiff 13 runtime environment) as used herein.
14 Architecture FIG. 1 is a block diagram illustrating the sniff tool architecture, in accordance with 16 an exemplary embodiment provided by this disclosure.
17 In one embodiment, the present disclosure provides a spiff tools architecture that 18 automatically field specializes an arbitrary program, given three inputs, as shown in FIG. 1:
19 = the application's Source Code, the obvious input, = one or more Workloads, and 21 = an Ecosystem Specification.
22 This architecture thus posits that field specialization will analyze the application 23 source code and eventually be an entirely automatic process utilizing these three inputs to 24 produce a specialized application having the identical semantics but running faster.
26 Goals 27 The goals of this architecture include the following.
28 1. Provide an end-to-end solution that takes source files for a program or suite of 29 related programs and automatically provides field specialized versions of those source files, including the code for spiff generation, spiff compilation, spiff instantiation, spiff 31 invocation, and spiff garbage collection.

1 2. Provide domain independence, in that the architecture works with virtually any 2 program, compiled to any conventional architecture, including highly parallel Graphical 3 Processing Units (GPUs).
4 3. Minimize the information needed to be provided by the tool user and maximize the information extracted from the program under analysis.
6 4. Partition the analysis into a sequence of tools, each of which performs only one 7 conceptual task.
8 5. Enable scalability of the tools, by ensuring that each tool produces a small output 9 that captures the result of that tool's analysis.
6. Enable incremental development, so that tools can be tested initially on small 11 programs and then on realistic programs.
12 7. Enable successive refinement, in that each tool can initially do just a partial, best-13 effort analysis (e.g., finding just some of the invariants, or minimal candidate snippets) and 14 then be refined to produce a more comprehensive output over time.
8. Enable performance benefit estimation, as the benefit of each individual code 16 transformation introduced by a spiff can be evaluated dynamically and/or independently; the 17 overall benefit of that spiff can be computed by taking into account the effected code 18 transformations and the characteristics of a particular workload, without exhaustively 19 evaluating all combinations of code transformations and measuring their execution time.
Tools 21 As shown in FIG. 1, the spiff tool architecture includes a number of tools. These 22 tools include: Invariant Finder, Tracer, Invariant Checker, Program Interaction Deducer, 23 Domain Assertion Deducer, Snippet Finder, and Spiff Maker, each of which will be 24 described in further detail below. With the exemplary spiff tool architecture shown in FIG.
1, one skilled in the art would understand that many variations and other partitioning may be 26 without departing substantially from the spirit and principles of the disclosure.
27 The description here utilizes a particular Program Representation (PR), such as an 28 Abstract Syntax Tree (AST), which is a common computer-readable structuring of the 29 source code of the application. This invention also applies when the analysis is performed instead directly on the textual source code of the application, on a lower-level intermediate 31 representation (IR) of the source code, or even on the equivalent assembly or machine code

5 1 representation of the source code. We term each individual program construct utilized by the 2 PR a Program Expression (PE). For an AST, a PE is an AST node; for IR, a PE is an IR
3 instruction; for source code, a PE is a single statement 4 Invariant Finder Invariant Finder takes as input a PR of the DBMS to be specialized and Trace Events

6 (optional), performs static analysis on the PR and outputs zero or more Invariant Intervals.

7 Some definitions:

8 Invariant Interval: A set of paths defined by a single starting PE (or equivalently, a

9 single position within the source code), and a single ending PR node that is reachable during one or more possible executions from the starting node, over which a particular property of 11 a variable holds. An example of such a property is not written. (An interval can consist of a 12 set of paths, rather than a single path. For example, none of the branches of an if/else block 13 modifies the variable in question, so the variable remains invariant on all the code paths 14 associated with these branches.) Note that the Invariant Interval starts within the starting PE
(that is, as soon as the variable has that assigned value: the starting PE is always a statement 16 that sets the value of the variable) and ends within the ending PE, right before the value is 17 set again. At each PE along this path, the value of that variable will be the same as it is at 18 other points along that path, hence: the term invariant.
19 Invariant Interval Set: A set of invariant intervals for a particular variable, where all invariant intervals in the set share the same starting node. An Interval perhaps may not be 21 maximal, in that it is terminated earlier than needed, if the analysis cannot ascertain that the 22 indicated property still holds after the execution of that PE
23 Value Flow Tree (VFT): A tree that captures the copying of one variable's value to 24 another variable. The VFT connects an Invariant Interval Set of variable X to an Invariant Interval Set of variable Y, when Y is assigned its value from X.
26 As an example, an invariant interval may exist over an interval where a value (i.e., 27 the property is a value) of the variable holds, e.g., "variable equals N" (for some constant 28 N). This may omit certain kinds of optimizations, e.g.:
29 = Optimizations based on program state, e.g., the memory allocation optimization in aggregate computations.

= Optimizations based on properties not of the form "variable equals N".
E.g., 2 given a code fragment if (p != NULL) S we know that the pointer p 3 must be non--NULL within S , and should be able to optimize away 4 redundant NULL-checks, e.g., in functions called from S.
= Optimizations based on derived values, e.g., string length, that may not be 6 explicitly materialized in the code.
7 = Optimizations based on domain knowledge, e.g., the cardinality of the set of 8 values that can appear in a column.
9 Example 1: if statements Consider the following, from Example 1 shown below:
1: int x = 0;
2: int y = read_from_file(...);
3: if (y < 55) 4: {
5: x = 6;
6:
: else {
. x = 13;

10:

11: int h = x;

12: int z = h + 88;

13: h = 10;

14: int a = x + 11;
11 Example 1 12 Here, Invariant Finder won't know statically whether the "if' statement will be true 13 or false. Thus, Invariant Finder should output the following Invariant Interval Sets for the 14 variable x:
For simplicity, we'll use line numbers instead of PE IDs to identify source locations.
16 We use closed-open intervals.
17 = Invariant Interval Set #1: Starts on line 1, with 1 invariant interval:
18 o Invariant Interval #1.1: Ends at line 3 1 = Invariant Interval Set #2: Starts on line 10, with 1 invariant interval:
2 o Invariant Interval #2.1: Ends at line 15 (that is, the end of the 3 program) 4 Invariant Finder may output the above in some structured format, such as XML;
however, in the present disclosure, lists and sublists will be utilized for simplicity.
6 Invariant Finder may vary in its precision but must be accurate.
Specifically, the 7 invariants that it produces should be correct, but do not necessarily need to be exhaustive.
8 For instance, x is actually invariant from line 1 to line 5, and from line 1 to line 9. However, 9 it's also accurate (but less precise) to, for example, just stop the interval at the beginning of the "if' statement. Of course, with less precise intervals, Snippet Finder and Spiff Maker 11 (tools which will be described below) will not have as many opportunities to field specialize 12 the application.
13 Invariant Finder could output such an Invariant Interval Set for every variable in the 14 program. Let's look at those for the variable h:
* Invariant Interval Set #3: Starts on line 11, with 1 invariant interval:
16 o Invariant Interval #3.1: Ends at line 13 17 = Invariant Interval Set #4: Starts on line 13, with 1 invariant interval:
18 o Invariant Interval #4.1: Ends at line 15 19 Variable y should be:
= Invariant Interval Set #5: Starts on line 2, with 1 invariant interval:
21 o Invariant Interval #5.1: Ends at line 15 22 z's should be:
23 = Invariant Interval Set #6: Starts on line 12, with 1 invariant interval:
24 o Invariant Interval #6.1: Ends at line 15 Finally, variable a's should be:
26 = Invariant Interval Set #7: Starts on line 14, with 1 invariant interval:
27 o Invariant Interval #7.1: Ends at line 15 28 Notice how variable h gets its value from variable x: its value "flows"
from x. z's 29 value, which in turn, "flows" from h. So, tying it all together, the VFT for x would be as shown in Example 2, below, given in an exemplar canonical representation.

<VFTs>
<VFT variable="x"
<link from="1" to="4" />, <link from="2" to="8" />
<link from="3" to="8" />
</VFT>
<VFT variable="h"
<link from="4" to="7" />
</VFT>
</VFTs>
1 Example 2 2 The numbers in the "from" and "to" attributes refer to one of the Invariant Interval 3 Sets (IS) above. So the first line indicated is from Invariant Interval Set #1 to Invariant 4 Interval Set #4.
Example 2: Pass-by-value functions 6 Suppose that a variable "a" is invariant in a function X( ) but temporarily changes its 7 value within a called function Y( a ). When Y ( a ) returns, the value of "a" still has its 8 (invariant) value. This situation is accommodated because a call to a function passing a 9 variable's value is a copy of that value to another variable, which is associated with its own Invariant Interval Set.
11 Example 3: Loops with no assignment 12 Consider the following code, from Example 3:
1: int x = 7;
2: for( . . . ) 3:
4: =
5 : some-Func(x, );
6: 1 7: some_other_func(x, );
8: x++;
13 Example 3 To understand loops, Invariant Finder should not actually unroll loops.
Rather, it 2 should check to see if there is an assignment to the variable within the loop. If not, as in this 3 example, then an invariant interval that reached that loop would extend across the loop:
4 * Invariant Interval Set #1: Starts on line 1, with 1 invariant intervals:
o Invariant Interval #1.1: Ends on 8 6 Example 4: Loops with assignment to existhg_variable 7 However, consider a conditional assignment to a variable in a loop, with reference to 8 Example 4:
1: int x = 7;
2: for( ... ) 3: {
5: if ( ...) 6: x = 42;
7:
8: some_other_func(x, ...);
9: x++;
9 Example 4 Here, Invariant Finder would create the following intervals:
11 e Invariant Interval Set #1: Starts on line 1, with 1 invariant intervals:
12 o Invariant Interval #1.1: Ends on 2 13 0 Invariant Interval Set #2: Starts on line 7, with 1 invariant interval:
14 o Invariant Interval #2.1: Ends on 9 = Invariant Interval Set #3: Starts on line 9, with 1 invariant interval:
16 o Invariant Interval #3.1: Ends on 10 17 Note that again, we're making the less-precise-but-still-accurate simplification to 18 exclude any loop that might potentially write to the variable. Further, it should be noted that 19 an interval is not ended at line 8, because the function call cannot change the value of x;
rather, this value is copied to a variable local to some_other_-Func.
21 Example 5: Loops with creation of new variable 22 Consider the creation of a variable in a loop, with reference to Example 5:

1: for(...) 2: {
===
4: int x = 42;
5: some_func(x, ...);
6: ===
7:
1 Example 5 2 Here, Invariant Finder would create:
3 (0 Invariant Interval Set #1: Starts on line 4, with 1 invariant interval:
4 o Invariant Interval #1.1: Ends on 8 (i.e., after the last iteration of the loop) 6 Example algorithm for Invariant Finder 7 Invariant Finder starts with the leaves of the call graph, that is, the functions that do 8 not invoke any other functions. It could then compute the VFT for that function, adding 9 edges when a variable is copied (e.g., h = x ; ). Then it could consider functions that only call leaf functions, and then add edges for local variables (such as when X is passed) and 11 merge intervals. Then it could iterate, considering functions that only call functions that 12 have VFTs computed for them.
13 Recursive functions and cycles within the call graph require additional care. The 14 bottom-up traversal of the call graph is a static analysis of the program. As an invariant must be true on all paths, Invariant Finder uses the signature and make the assumption that the 16 indirect call can be to any function that matches its caller signature.
17 Note that a memory allocation within a loop can give rise to many different runtime 18 locations. Once such an allocation occurs, the memory pointed to by that variable is 19 invariant until the variable is assigned. An allocation within a loop will either be assigned to a new element (say of an array) or will overwrite a variable 21 For indirect function calls, Invariant Finder can alternate forward-analysis steps, 22 which propagates values for function pointers to figure out the set of possible targets for 23 each indirect call, with backward analysis steps, which propagate value flows through the 1 call graph bottom-up as described above. This alternation can be iterated until the set of 2 function-pointer targets stabilizes.
3 The Invariant Interval may further identify the possible values that the variable may 4 take on. As an example, for a variable join_type, there may be only a few different values ever assigned to that variable, and they may all be known statically.
Sometimes this is 6 specified in the variable type (an enumeration) and sometimes this can be discovered by 7 static analysis, e.g., by examining all the values assigned to that variable. When the set of 8 possible values is small, Invariant Finder may record the invariant interval for each value.
9 Correctness Each invariant interval returned by the tool should be correct¨that is, the associated 11 variable should be guaranteed to be unchanged over all paths between the start and the end 12 of the interval, not including the end. If there are any indirect assignments within any of the 13 paths, the Invariant Finder tool must ensure that all such assignments cannot change the 14 value of the indicated variable.
The analysis may be conservative, in two ways. First, there may be false negatives:
16 intervals that are correct but are not returned by the tool, either (a) as an interval set or (b) as 17 an individual interval within an interval set. It is acceptable if the tool indicates where 18 variables are assigned (that is, starting an invariant interval) but not analyzed by the tool 19 (missing interval set) as well as the interval sets that are incomplete (missing individual intervals).
21 Second, there may be non-maximal intervals: intervals that do not end in a statement 22 that definitely changes the value. This could be caused by (a) an assignment that does not 23 actually change the value or (b) a non-assignment for which the analysis is not sufficiently 24 precise to determine that the value is not changed, for example, a "for"
statement that might change the value within that statement.
26 Correctness also requires that all value flow tree links be correct:
that each represents 27 the copying of a value. However, these links can be non-maximal, in that an interval set 28 need not be linked to another one, even if its value does indeed come from that other one.
29 Tracer Another tool disclosed herein is referred to as "Tracer." Tracer takes as input an 31 Executable under a Workload and outputs a sequence of Trace Events. A
Trace Events 1 output typically records an execution of an instruction that may affect data flow within the 2 program, such as "loop entered", "variable read", or "function call".
3 The Trace Events are processed by a further tool, "summarizer," to produce 4 Execution Summaries, which provide an output of functions, statements, and variables along with their execution statistics. Such information indicates "hot spots" within the application 6 that could benefit from field specialization.
7 Correctness dictates that if certain activities of interest occur during the execution, 8 that the relevant Trace Event is output and/or recorded, and that every output and/or 9 recorded Trace Event corresponds to an occurrence of an activity of interest, in the order indicated.
11 Invariant Checker 12 Another tool, "Invariant Checker," determines whether any violations of identified 13 Invariants (e.g., as identified by Invariant Finder) occur in a given execution, using the 14 Trace Events from that execution. (Alternatively, the developer can provide guidance to Invariant Checker by indicating important variables to watch.) Ideally, Invariant checker 16 would find no violations for the many executions of the DBMS Executable over many 17 Workloads (thereby confirming as correct the invariants found by Invariant Finder).
18 Invariant Checker may be run to periodically to further validate the analysis done by 19 the other tools (such as Invariant Finder and Tracer). Users of the application may run Invariant Checker, for example, and be provided with an indication that no violations were 21 found. On the other hand, if violations are found, the user may be provided with an 22 indication that violations were found, and may further be provided with a message to contact 23 technical support for assistance.
24 Another use for Invariant Checker is as a debugging tool, for example, employed by the developers of the tools described herein to ensure the correctness of the static analysis 26 (e.g., the invariants identified by Invariant Finder).
27 Program Interaction Deducer 28 The tool "Program Interaction Deducer" uses the PR (or an equivalent 29 representation, such as source code, IR code, or even assembly or machine code) and an Ecosystem Specification to deduce the Program Interactions, a list of data files along with 31 associated information. Basically, the Program Interaction Deducer ascertains where in the 1 program(s) values are stored in a file, where values are subsequently read from a file, and 2 where those values are removed from the file (or the file itself is removed). Those values 3 will then be determined to be long-term invariants within the Domain Assertion Inducer.
4 The Ecosystem Specification states (a) what data is involved, (b) which data files are fixed and which can vary, (c) which program(s) can create, access, and discard this data, and 6 (d) any concurrency requirements. In this disclosure, the focus is on files; however, in 7 general, this specification is concerned more generally with reading and writing data from 8 the outside world, which includes files, but may also include user I/O, streams to/from other 9 processes, and possible other ways for a program to get data as well as other interactions with the 0/S, such as allocating memory and dealing with character encodings.
Files may be 11 the most common way, and the focus of the discussion here, but it should be understood that 12 the present invention may utilize any other such form of data.
13 Table Spiff Use Case 14 To aid in the description of tools provided herein, some examples will be described in relation to a prototypical DBMS ("minidb"). We give in Example 6 an excerpt from the 16 minidb.h and minidb.c source files, which we will refer to repeatedly.
In minidb.h:
18 #define BUILDQUERY(query, exec command, table_schema, predicate count, predicates) \
19 do {
(query)->executor_command = (exec command); \
21 (query)->schema (table schema); \
22 (query)->num_predicates = (predicate count); \
23 (query)->predicate_list = (predicates); \
24 1 while (0) In minidb.c:
48 int SequentialScan( 49 int scan_direction, 50 int num_predicates, 51 Predicate** predicates, 52 const TableHeader* schema, 53 char* full_row_data, 54 unsigned long* row_values) 55 {
4==
73 for (i = 0; i < schema->num_columns;

..=

152 void ExecuteQuery(Query* query) 153 {
===
161 while (1) 163 ret_code = (query->executor_routine)( 164 query->executor_command, 165 query->num_predicates, 166 kquery->predicate_list), 167 query->schema, 168 full_row_data, 169 row values);
===

209 int OpenTable( 210 const char* data_file_name, 211 FILE** table_ptr, 212 TableHeader* table_header) 213 {

. . .
219 int byte read = fread( 220 &(table_header->num_columns), 221 1, 222 sizeof(long), 223 *table_ptr);

294 TableHeader GetTableHeader(char* table-Filename) 295 {
296 TableHeader result;
===
299 if (OpenTable(table_file_name, kresult.table_file), &result)) 305 return result;

308 void BuildPredicates( 309 char* predicates_text, 310 TableHeader table_header, 311 int* num_predicates, 312 Predicate** predicates) 313 {
314 if (strlen(predicates_text) == 0) 316 *num_predicates = 0;
317 return;

320 *num_predicates = 1;

321 char* substring = predicates text;
322 while ((substring = strstr(substring, "AND")) 1= NULL) 324 (*num_predicates)++;
325 substring += 3;

===

372 Query BuildAndPlanQuery( 373 TableHeader table header, 374 int num_predicates, 375 Predicate* predicates) 376 {
377 Query query;
378 BUILDQUERY(&query, SCAN_FWD, &table_header, num_predicates, predicates);
387 return query;
388 }
610 int main(int argc, char** argv) 611 {
===
617 while ((command = GetNextCommand(&command)) != NULL) 619 char command_type = command[0];
620 char* saveptr = NULL;
621 char* table name = strtok_r(command + 2, " ", &saveptr);
622 char* data_file_name = GetDataFileName(data_directory, table_name);

624 switch (command type) 626 case 'S':

634 TableHeader table header =
GetTableHeader(data_file_name);
===
640 Query query = BuildAndPlanQuery( 641 table_header, 642 num_predicates, 643 predicates);
644 ExecuteQuery(&query);
1 Example 6 2 An Ecosystem Specification, which can be provided by the developer as a 3 configuration file describing non-obvious traits of particular functions regarding data flow 4 manipulation in the application (an example Ecosystem Specification is shown in Example 7 below), would state that (a) the data starts out empty and the workload is read from stdin or 6 a file, (b) (workload) data can vary, (c) only minidb will access the data, and (d) at most one 7 instance of minidb will be running on any specific directory. The Ecosystem Specification is 8 essential to understand that the schema is invariant across executions of minidb.
<?xml version="1.0" encoding="UTF-8"?>
<ecosystemSpec xmlns:xsi="..."
xsi:noNamespaceSchemaLocation="ecosystemSpec xsd">
<data>
<directory type="database" initiallyEmpty="yes" I>
<datafile type="table" initiallyEmpty="yes" />
<workload type="workload" initiallyEmpty="no" />
</data>
<programs>

<application name="minidb"
<dataAccess name="database" reads="OpenTable():1"
parallelAccess="no" I>
<dataAccess name="table" creates="CreateTable():3"
parallelAccess="no" I>
<dataAccess type="workload" parallelAccess="yes" >
<access from="stdin" at="GetNextCommand():8" I>
<access opens="GetNextCommand():13" 1>
</dataAccess>
</application>
</programs>
</ecosystemSpec>
1 Example 7 2 minidb uses two types of data: table, a file holding the rows of a table; and workload, 3 a file containing SQL statements. The type names are only to differentiate these files in the 4 rest of the description. Each table is in a directory (the database).
There is one program in this ecosystem: minidb. It creates table data files.
We give 6 the line of code that performs this action (e.g., line 3 in the CreateTable function), to tell 7 the Domain Assertion Deducer which file is being manipulated (here, the specific file 8 mentioned on that line of code). The Cons las font also used in the Examples are the names 9 of functions in the minidb source code. The verb "reads" indicates that the directory is not created nor removed by the application. For table data files, the file is indicated by file 11 passed to CreateTable . The verb "creates" also implies "opens,"
"reads,"
12 "writes," and "removes." (It may be possible for Domain Assertion Deducer to figure out 13 that each table is in the database directory, so the inDirectory attribute and perhaps the 14 entire directory element may not be necessary, shortening the Ecosystem Specification by one line.) 16 This program opens workload data files, which implies "reads." Here the file is that 17 passed to GetNextCommand( ). Or this file might be read from stdin in line 7 of 18 GetNextCommand( ). Multiple parallel instantiations of minidb might be reading the same 19 workload file, but won't be accessing or changing the database directory or the table files within it.

1 The Program Interactions extracted from the PR (see Example 8) states that minidb 2 creates table files in this directory, reads and writes them, and then removes them, indicating 3 exactly where in the source each of these file actions occur.
Furthermore, the table header 4 within a file is never changed within a file and that file is uniquely identified by the variable "data_file_name."
<?xml version="1.0" encoding="UTF-8"?>
<programInteractions xmlns:xsi="..."
xsi:noNamespaceSchemaLocation="programInteractions.xsd">
<datafile type="table" application="minidb" >
<create at="CreateTable():3" inDirectory="database"
filename="data_file_name" />
<remove at="ExecuteDelete():38" final="yes" />
<remove at="main():57" final="yes" />
<data type="TableHeader" declaredAt="minidb.h:20" >
<add at="WriteTableHeader():3" data="table_header" I>
<read at="OpenTable():8" I>
</data>
<data type="RowHeader" declaredAt="minidb.h:29" >
<add at="WriteRow():3" />
<read at="SequentialScan():7" />
<delete at="ExecuteDelete():25" />
</data>
<data type="char *" >
<add at="WriteRow():6" I>
<read at="SequentialScan():15" />
<delete at="ExecuteDelete():15" I>
</data>
</datafile>

<workload type="workload" application="minidb" >
<data type="char *" >
<read at="GetNextCommand():8" />
<read at="GetNextCommand():18" />
</data>
</workload>
</programInteractions>
Example 8 2 The table data file is first created in the database directory. (Since a sole application 3 is used in this example, minidb, we can specify it in the datafile rather that at the add, 4 remove, ext. operations on that data.) This file contains three data structures: the TableHeader, multiple "RowHeaders", each with the row (a string). The analysis in 6 subsequent tools doesn't need to know the inclusion structure; all that is needed is the data 7 structures written and subsequently read. Of course, once data is written to a file, it can be 8 read, possibly many times, before that data is deleted.
9 The lifetime of a file extends beyond an individual execution of an application. One execution might create the file, another might subsequently write data to that file, another 11 might subsequently read that data, and another might subsequently remove the file. The 12 critical semantics is that data written to a file will be the same data subsequently read from 13 that file, until that data is deleted from the file or the file itself is removed. The other critical 14 semantics is that we know from the PR the actual C structs written out to the file and subsequently read in.
16 Coming back to the particulars of the prototypical DBMS (i.e., minidb), 17 interestingly, the delete is actually a file write. What happens is that the row is written over, 18 effecting a delete of the original row.
19 The logic in ExecuteDelete( ) is particularly complex: a temporary file is created, the rows before the row to be deleted are copied over to the temporary file, the rows after the 21 row to be deleted are copied over, and then the temporary file is renamed. Program 22 Interaction Deducer may include such logic to handle these particulars.
23 Table Spiff Instance Use Case 1 Table spiff instances are associated with particular rows in the database, the handling 2 of which is discussed above.
3 Row Spiffs 4 The notion of a "row" seems quite domain-specific. However, the general notion is of a part of a data file that is read, written, and processed as a whole.
There is also the notion 6 of a query evaluation loop over each row, but that can also be generalized as the portion of 7 code used to process each ascribed part of the input file. So recognizing a row spiff requires 8 recognizing when a portion of the input file is processed, with the same code being used 9 repeatedly over different portions, each with the same structure.
The implementation of row spiffs requires (i) determining which invariant values to 11 utilize in partitioning the data, (ii) placing a spiff id in the data, and (iii) possibly removing 12 data values that can be determined from the spiff id. The first step uses the cost model, 13 which relies on the workload. The second actually changes the structure of the input data 14 and so must change each relevant program in the ecosystem, those that read or write that portion of the data. The third challenge would be handled similarly.
16 Hence, the unique aspects of the notion of a row spiff is (a) identifying portion(s) of 17 the data that are processed in a unit and (b) changing the data so that it can be more 18 efficiently manipulated in the program(s) that access this data.
19 Query Spiff Use Case Query spiffs are a combination of query, table, and row invariants. The last two are 21 dealt with above, while query invariants are found by Invariant Finder, as in this case they 22 do not persist across minidb executions, because the workload where queries come from is 23 only read and could be used by several minidb incarnations (e.g., as pa rallel.Ac ess is 24 allowed).
Example Algorithm for Program Interaction Deducer 26 As shown in FIG. 1, Program Interaction Deducer has two inputs: the Ecosystem 27 Specification and the PR. While the Ecosystem Specification focuses on the programs that 28 read and manipulate the data, the Program Interactions produced focuses on what is done to 29 files, in particular, where data structures within the programs are written to and read from files. To do so, the Program Interaction Deducer, or PID, analyzes file manipulation system 31 calls, in particular, fopen ( ), fwrite( ), and remove( ). It uses as its starting point the 1 <datafile> and <workload> specified by the Ecosystem Specification, in this case, table 2 and workload (e.g., as shown in Example 6). (Note that PID also analyzes database, but 3 figures out pretty quickly that this is a directory that is only read by OpenTable().) 4 Between these file manipulation calls, PID watches the flow of FILE*
values.
The workload file is particularly easy to analyze. The Ecosystem Specification 6 specifies that this file is opened at GetNextCommand ( ):13. (The file can also be read from 7 stdin.) PID determines by analyzing the source code referenced by the specification:
8 e that this file is named by queryfile_name, 9 = that it is associated with the FILE query file and stdin, and = that the only reads of this file are of character strings read from this file at 11 GetNextCommand( ):8 and GetNextCommand0:18.
12 The Program Interactions Deducer thus outputs this determined information into the 13 Program Interactions file, as shown in Example 7.
14 The table file has more complex behavior. The Ecosystem Specification states creates="CreateTable( ) :3" indicating that we need to follow data_file_name, 16 which originates in the data structure TableHeader.table_file as deduced by analyzing 17 the referenced source code.. So PID can see the flow:
18 = from main( ) : case 'C' to CreateTable( ) 2 (fopen()) 19 = then a few lines later a call to WriteTableHeader( ): 3 (fwrite()) = back to main( ) and then to many rows getting written (in 21 WriteRow( ) : 3and 6 via case fwrite()) 22 * and rows getting deleted (in ExecuteDelete() :25 via case 'D': the 23 absence of an fwrite( ), though this will be challenging to detect), 24 = followed eventually by the file being deleted within main( ) : 57:
remove), again, by examining the referenced source code.
26 From this overall sequence of operations on a table datafile, associated with the C
27 FILE table_header.table_file, PID can deduce 28 = that the TableHeader data structure is written to the table file at 29 WriteTableHeader( ) :3, then = subsequently read at OpenTable() :8.

1 Interestingly, that is all that is done with a TableHeader: it is only written once and 2 never deleted from the file.
3 PID can also determine that the RowHeader data structure is 4 = added to the table file at WriteRow() :3, * subsequently read at SequentialScan() :7, and 6 = removed from the file at ExecuteDelete() :25.
7 Finally, PID can determine that character strings are 8 = added to the table file at WriteRow() :6, 9 = read at SequentialScan( ) :15, and = removed from the file at ExecuteDelete() :15.
11 The analysis performed by PID is thus to analyze how each program manipulates 12 each file identified in the Ecosystem Specification, by following the values of variables of 13 type FILE and observing:
14 1. where the name of the file comes from (also a variable within the program), 2. where the file is opened, 16 3, from there, where does the file value flow in the program, 17 4. and hence, what data structures are (i) written to and subsequently (ii) read from 18 and subsequently (iii) deleted from the file, 19 5. and finally, where that file is deleted or closed.
It should be noted that this analysis is entirely within the context of a single 21 execution of a single program. If there are multiple programs, each is analyzed separately.
22 There may often be multiple executions of each program, but the analysis only considers a 23 single execution.
24 The PID analysis first must:
= find the file variables, 26 = calculate the value flow of such values, 27 = and along the value flow, identify file open, read, write, and delete 28 operations, 29 = for each, identifying specific information to be recorded in the Program Interactions.

1 The Domain Assertion Deducer, described below, takes the per-execution behavior 2 extracted by PID and stitches them together into a holistic understanding of how data flows 3 from programs into files and then back into (perhaps subsequent executions of) programs, 4 thereby computing invariant flows across program executions, something traditional compiler analysis cannot do.
6 Domain Assertion Deducer '7 The Domain Assertion Deducer tool uses the PR, identified Invariants, and the 8 Program Interactions to deduce the Domain Assertions. For table spiffs, the Program 9 Interactions imply which schema information is invariant. For table spiff ilstances, the Program Interactions is essential to understand where in the program(s) rows are created, 11 accessed, updated, and deleted. Query spiffs can exploit schema, row, and workload-12 generated invariants, in conjunction with invariants of smaller scope that are within the 13 purview of conventional compiler optimization techniques. Specifying Domain Specific 14 Knowledge describes some of these computations in some detail. Then the Domain Assertion Deducer stitches together the Invariant Interval, following the values as they are 16 read from and written to files, perhaps across multiple invocations of the program(s), to 17 derive the complete lifetime of a value, which is then encoded in the Domain Assertions. If 18 the Workloads are complete, that is, if they completely characterize the possible operations 19 on the data, which can be the case in some batch applications, the Domain Assertion Deducer (DAD) can also deduce a restricted set of possible values for the invariant variable.
21 It is precisely this information--Domain Assertions and perhaps a set of possible invariant 22 values--that a conventional compiler does not have.
23 An important aspect of field specialization is that it makes use of information not 24 generally available to a compiler. This information is of two forms: (i) domain-specific knowledge and (ii) extra-source knowledge. Both kinds of knowledge go beyond what a 26 compiler can discover or conclude by just looking at the source code of a single program.
27 Field specialization takes into account the much broader ecosystem of the program being 28 specialized: what data will it read or manipulate, what programs will it invoke, what other 29 programs will also read or manipulate this data, what operating system(s), network routers, and storage systems will be involved? This ecosystem provides a great deal of information 1 that field specialization can exploit to increase the efficiency of the program (and its data) 2 being specialized.
3 Compared to somewhat generic extra-source information, domain-specific 4 knowledge applies only to programs in a particular domain. An example of domain-specific knowledge is "all changes to a table's schema will be serializable." The notion of 6 serializability is a complex one that arose out of the database domain, though it is finding its 7 way into other parallel and distributed information processing applications. Such knowledge 8 allows creating table spiffs that speed up the DBMS, including indicating exactly where a 9 table spiff should be created and where it should be destroyed.
A second form of domain-specific knowledge is that of the workload of a program.
11 An example is "OLAF (on-line analytical processing) applications exhibit little data 12 volatility: (often complex) queries dominate during the day, with updates occurring 13 infrequently, typically overnight." Such information is of the form "this activity is more 14 frequent than that other activity," thus providing guidance to field specialization, allowing it to make better-informed decisions trading off work now that will speed up something else 16 later on.
17 An example of extra-source knowledge is a particular portion of a file that is written 18 to and read by only the one program will remain until either the code that modifies that 19 portion or the code that deletes the file is executed. Such knowledge allows creating spiffs that speed up any program that processes an input file repeatedly.
21 Preferably, the domain-specific and extra-source knowledge is formalized, so that 22 the Spiff Finder can read files containing domain assertions and extra-source assertions that 23 state in a formal manner such knowledge. The Spiff Finder would then read the files 24 comprising the DBMS source code (or more generally, the source code of any program in the domain described by the domain spec) and output the spiff invariants, for use by the 26 Spiff Maker.
27 Table Spiff Use Case 28 Coupling the information from the Program Interactions with the Invariant Interval 29 on main( ) :table_header.num_columns provides the information needed by DAD to produce the following domain assertion, shown in Example 9.

// start a segment when the table is created // this includes the directory {<CreateTable():3, CreateTable.data_file_name>}
// and end when it is dropped {<main():57, main.data_file_name>}
], {data_directory, main.table_name -> main.table_header.num_columns}
2 Example 9 3 Note that we don't include ExecuteDelete( ) :38. That is because our analysis 4 notices that that is a rename.
Table Spiff Instance Use Case 6 Database table rows differ from schemas in that there are multiple rows per table but 7 only one schema. In the above, there is one value of table header.
num_columns stored 8 in a file associated with a table; this value is initially written out to a file and subsequently 9 read back in. The Program Interactions tells us that, as well as that various different values of row data are written to the same file.
11 We tell rows apart by where they reside in the data file, that is, by the file position 12 (the offset of the first byte of the row).
13 The domain assertions generated by DAD, as shown in Example 10, differentiate 14 rows with a keyword OF FS ET on the left-hand-side of the function dependency, which indicates that the functional dependency holds only when the current position in the 16 execution of the program within the indicated file to be read or written is a particular value.
// start a segment when the row is created // this includes the directory {<minidb.c:main():43, CreateTable.data_file_name, OFFSET>}

// and end when the table is deleted f<minidb.c:main0:57, main.data_file_name>, //or when the row is deleted <minidb.c:ExecuteDelete0:25, ExecuteDelete.data_file_name, OFFSET>1 1, {data_directory, main.table_name, OFFSET -> SequentialScanO.row_data, SequentialScanO.row_values}
1 Example 10 2 Intuitively, while there is data at the position when the read occurs, that data will be 3 the same at the data written to that position earlier, stated by the segment start. DAD notices 4 that multiple values of a variable in the program are written to the same file, and then uses OFFSET to differentiate those values. Note that one of the segment end statements does not 6 include OFFSET, as that ends all row invariants for that entire file, as the file is deleted.
7 The DAD may even be able to tell that case 'D' (a minidb command, which is 8 interpreted in minidb's implementation using a switch/case statement) moves data packets 9 from one OFFSET to another within the file. It will be readily understood by those skilled in the relevant field that one need only to extend the domain assertion formalism to 11 accommodate such moves.
12 Other than the OFFSET, then, a row invariant has the same structure as a table 13 schema invariant.
14 Going further, each file written to or read from an application is considered to be composed of data packets, each of which is an external foil!i of values in local variables of 16 the program that are written and read as a unit. So minidb.c places in files first a schema 17 data packet (including the value of table header, num_columns) and then a sequence 18 of row data packets (including the value of row values).
19 Query Spiff Use Case (Example 11) There are two cases. The first is when the workload (that is, the queries) come from 21 stdin, in which case no domain assertion can be deduced, because the user could type 22 anything in. (Of course, we can still use the VFT to determine the (many) invariants active 1 during a query, but that was already done by Invariant Finder, in a previous step.) The 2 second is when workload comes from a file, say as a file named in the invocation argument, 3 in which case the domain assertion is essentially the same as that for a row invariant, in that 4 it deals with the OFFSET. For this second case, we use the name of the file to denote the actual file as the source of the workload. These two cases are differentiated by the 6 Ecosystem Specification, which in the second case states that a particular Workload file 7 could be specified. That said, we may not know who created the workload file, and in such a 8 ease we cannot yet create query spiffs for that workload.
// start a segment when the -predicates structure is initialized {<minidb.c:main0:28, GetNextCommand.query_file_name, OFFSET>}
// and end when the struct is freed {<minidb.c:main0:34, GetNextCommand.query_file_name, OFFSET>}
1, {GetNextCommand.query_file_name, OFFSET -> mainO:predicates}
]
9 Example 11 The second case may be used to specialize on a workload. Query spiff ids may then 11 be put into the workload or as an association stored somewhere else, and used when that 12 workload was executed.
13 In the following, we will just consider the first case, where the query is not known 14 until it is read in.
Note that a query spiff that only includes invariants in effect during a query should 16 be discoverable by an aggressive optimizing compiler. But more importantly, a query spiff 17 combines such query invariants with schema and row invariants, which cannot be 18 discovered by a compiler, because such schema and row invariants require knowledge about 19 the semantics of file reads and writes. It's that aspect that makes a query spiff a true spiff.
Example Algorithm for Domain Assertion Deducer 21 Trace partition elements come from program interactions: where data files or 22 components of data files (here: table header and rows) are created, inserted, or deleted. The 1 dependencies within domain assertions come from the directory and file name and 2 optionally an OFFSET within the file. The Invariants tie these together, so that values of 3 variables within the application can be seen to flow through the application code, out into a 4 file, and later back into the application, thereby ascertaining long-lived invariants, for which candidate snippets and ultimately spiffs can be determined.
6 For the table spiff use case, in which the table_header.num_columns is 7 characterized in a domain assertion, DAD can determine that:
8 = main( ): Case 'C' calls CreateTable() 9 * which calls WriteTableHeader0 * which writes the table header to the file.
ii The table header:
12 = is read by this and subsequent invocations of the program, 13 until the file is deleted, at main( ) : 57 .
14 This implies that the table header in a table <datafile> is only written once and never modified by the program. Knowing this, DAD can generate the appropriate trace 16 partition elements, that of the table <datafile> being created and deleted. DAD can also 17 create the dependency concerning the table_header.num_columns.
18 For the table spiff instance use case, the relevant data packets are the row_data and 19 row_values being added to the table <datafile>, perhaps deleted from that file, and ultimately removed when the file itself is deleted. DAD determines that once the file is 21 created, the program may store multiple values of row_data into the file, thus each such 22 packet can be identified by the OFFSET it resides at.
23 The above observations provide an algorithm for DAD. For each FILE that is created 24 or opened by a program, DAD figures out where that file was initially created and where the name for that file comes from, via the VFT. Then, for each data structure that is stored in 26 that file (these are the <d at a> elements in the Program Interactions), DAD ascertains the 27 file operations performed on that data structure (adding that data structure to the file, 28 possibly changing or removing that data structure, and finally deleting that file). These 29 operations then imply the appropriate trace partition elements. Finally, from the program data structures used in these operations (the C or C-1-+ program data structure written to the 31 file), DAD can inspect the VFT to determine where the values in such program data 1 structures originate, to imply the dependencies. DAD also determines whether a file contains 2 only one data packet (as in the case of n um_c o 1 umn s) or multiple packets (as in the case of 3 row data) by tracking what is done on each FILE variable as it flows through the program, 4 also determinable via the VFT. Multiple packets require an OFFSET in domain trace partitions and dependencies.
6 Correctness 7 Correctness dictates that the Domain Assertions that are produced are complete and 8 are consistent with the input PR, Invariants, and Program Interactions.
9 Snippet Finder As shown in FIG. 1, another tool, Snippet Finder, takes as input:
11 = One or more Invariants 12 = An PR
13 = One or more Execution Summaries 14 = Domain Assertions Program Interactions, and = A Cost Model 16 Snippet Finder outputs one or more <spiff> elements, each containing one or more 17 Candidate Snippets, each of which contains:
18 = an Interval of code identified by PEs 19 = a set of Invariants = a set of possible values for each Invariant 21 = the source location(s) where the value of each invariant was first written to a 23 = the source location(s) where that value was removed from a file, 24 = the source location(s) where the value is read in each time, = the appropriate lifetime of the Candidate Snippet, that is, when the associate 26 spiff can be created (and whether at compile-time or run-time), and 27 = optionally, suggested optimizations to be employed within the Interval.
28 Each Domain Assertion implies an interval that is probably broader than just one 29 program execution, in contrast to the interval recorded in Invariants, which has a scope within a single program execution.

1 Snippet Finder uses Domain Assertions to expand the scope of the Invariants and to 2 refine the set of possible values for each Invariant. The interval of each Invariant overlaps 3 (either partially or entirely) the interval of the Candidate Snippet.
Furthermore, the interval 4 of each Candidate Snippet is tailored by the Cost Model, to minimize the size of the interval while maximizing the savings, calculated as the cost of executing an optimized version of 6 the snippet times the number of times that snippet was evaluated, drawn from the Execution 7 Summaries, plus the cost of invoking a spiff. To do so, the Snippet Finder must have an idea 8 of the possible optimizations that the Spiff Maker performs and the benefit of each such 9 optimization, the latter from the Cost Model.
Query Spiff Use Case (Example 12) 11 There are two major differences between query spiffs and the table spiffs considered 12 earlier. The first is encountered at this step: Snippet Finder uses the Invariants, not from the 13 Domain Assertions, as queries usually do not persist (though see the discussion above about 14 Workloads being given to stdin). The second will be encountered later:
Spiff Maker needs explicit guidance from the Ecosystem Specification as to where the boundary is between 16 compiling spiff code and just instantiating a spiff instance at run time.
17 Snippet Finder infers:
18 an Interval of code from SequentialScan() :40 (immediately after 19 unpacking the data packet into row_values [I) to SequentialScan :

(the end of the method), 21 = a set of Invariants: main( ) .query, in particular, 22 query. executor_routine, query.executor_command, 23 query. num_predicates, query. predicate_list and predicates[]
24 read from stdin and query. schema from the table spiff use case, 9 a set of possible values for each Invariant, in this ease, 26 query. executor_routine is always SequentialScan( ), 27 query. exec utor_command is always SCAN_FWD. For each predicate, 28 column Id is an arbitrary int read from stdin (deduced from the 29 assignment to that field in BuildPredicates 0), constant operand is an unsigned long read from stdin, and operator_function is either 31 &Equalint4 or &LessThanInt8, 1 = the source location(s) where the value of each invariant was first determined:
2 the value of query is determined by main() ): 32, that is, right after the call to 3 BuildAndPlanQuery( ).
4 = the value of query is never written to, removed from, or read from a file, = the appropriate lifetime of the Candidate Snippet: the lifetime is just within 6 the 'S' switch; the Ecosystem Specification tells us that we cannot call the 7 compiler there, so we just pre-compile spiffs for executor_routine as 8 SequentialScan( ), for executor_command as SCAN_FWD, for 9 num_predicates from 1 to 6, and for each such predicate, operator_function is either &EqualInt4 or &LessThanInt8, 11 = unroll the for loop at SequentialScan( ) :47 as it is terminated by the 12 value of schema- >num_columns.
<?xml version="1.0" encoding="UTF-8"?>
<candidateSnippets xmlns:xsi="..."
xsi:noNamespaceSchemaLocation="candidateSnippets.xsor>
<spiff createAt="compileTime" >
<invariantIntervalSet variable="query"
start="main():28 >
<interval stop=main():28" I>
<generate variable="query->num_predicates" fromValue="1"
toValue="6" I>
<possibleValues variable="query->executor_routine" >
<possibleValue value="SequentialScan()" I>
</possibleValues>
<possibleValues variable="query->executor_command" >
<possibleValue value="SCAN_FWD" I>
</possibleValues>
<possibleValues variable="query->predicate_list[]->operator_function" >
<possibleValue value="&EqualInt4" I>
<possibleValue value="&LessThanInt8" />

</possibleValues>
</invariantIntervalSet>
<snippet existsFrom="SequentialScan0 :40"
existsTo="SequentialScan( ) :59" >
<suggest opt="Unroll loop" at="SequentialScan():47" />
</snippet>
</spiff>
</candidateSnippets>
1 Example 12 2 It is not clear how f romVa1ue and toValue are determined, but it seems that it is 3 to bound the number of compile-time query spiffs that are generated.
4 Example Algorithm for Snipper Finder Snippet Finder first expands invariants to across program executions by tracking 6 which variables are read from files and where those values are put into files. This results in 7 invariant intervals that span multiple executions. The tool also needs to track when the value 8 was first written to the file and when it is deleted.
9 The other challenge to Snippet Finder is in bounding the snippets, using the Cost Model. In doing so, the tool needs to know what optimizations Spiff Maker can effect, and 11 under what conditions each optimization is feasible.
12 Correctness 13 Correctness of this tool dictates that each of the Candidate Snippets produced by this 14 tool be consistent with the input Invariants, PR, Execution Summaries, Domain Assertions, Program Interactions, and Cost Model. The indicated invariants should indeed be invariant 16 over the snippet, the suggested optimizations should be consistent with these invariants and 17 their manipulation within the PR, and the possible values should indeed be possible.
18 It is desirable though not required that:
19 = the snippets that are returned are the most desirable, given the cost model, = the snippets be maximal, in that making them larger would result in a larger 21 cost, from the cost model, 22 = the snippets be minimal, in that making them smaller would result in a larger 23 cost, from the cost model, and 1 6 that the suggested optimizations would be helpful, given the cost model.
2 Spiff Maker 3 As shown in FIG. 1, another tool, Spiff Maker, takes as input one or more Candidate 4 Snippets and a PR, and produces as output Specialized Source Code.
Specifically, for each input Candidate Snippet, Spiff Maker should perform the 6 following tasks:
7 1. Create a .h file for the sniff pattern, defining all pattern parameters and spiff 8 pattern functions.
9 2. Create a .h file for the spiff implementation declarations.
3. Create a .c file for the spiff implementation definitions.
11 4. Insert code in the appropriate place(s) to create the spiff (for dynamic spiffs), to 12 invoke the spiff, and to destroy the spiff (again, for dynamic spiffs).
13 Each use case is associated with a specified branch of minidb, for concreteness. Each 14 branch includes the Candidate Snippet that causes the generation of that configuration.
It may be convenient to use TXL for the actual transformation, as a PR-to-PR
16 transformation, with the transformed PEs, then converted back into textual source code to 17 create the spiff. TXL includes a parser, but it may be possible to take PEs directly. TXL also 18 includes a syntax tree unparser which may work with our PEs.
19 For Spiff Maker to function as described, it may require some guidance based on domain knowledge. Specifically, Spiff Maker may need to be given/told:
21 Specifications for all static implementations to be produced.
(ie, which 22 variables(s) are specialized, and their values if so.) 23 = Rules for disambiguation, for cases where more than one static 24 implementation applies = Rules for creation of dynamic implementations: Are they allowed at all? Are 26 they cached? If so, how? In memory? On disk? What are the size(s) and 27 management rules for the cache(s)? Should a generated dynamic 28 implementation be fully specialized, or would it be better to only partially 29 specialize, and leave some parameters generic? Is it acceptable to compile a dynamic implementation on-the-fly as needed, or is it only acceptable to use 1 a dynamic implementation if it's already in the cache? Do the answers to 2 these questions change?
3 = Should a fully-generic implementation be created (and used as a fallback 4 inside)? Or, will some variables always be specialized, one way or another?
(This will determine which variables will need to have representations in the 6 data block.) 7 In general, Spiff Maker will be told all of the above in the input file.
It's Snippet 8 Finder's job to figure out how many static implementations to create, whether it should be 9 static/dynamic, which variables are specialized and which are not, and so on. There will only be a single static implementation to create, and that single static implementation is always 11 the one that should be called.
12 Query Spiff Use Case 13 Referring to Example 13, the input is as follows, indicating compile-time query spiff 14 for exec utor_comma nd as SCAN FWD, for num_predicates froml to 6, and for each such predicate, operator_function is either &Equa1Int4 or &LessThanInt8, as 16 specified by Snippet Finder.
<?xml version="1.0" encoding="UTF-8"?>
<candidateSnippets "
xsi:noNamespaceSchemaLocation="candidateSnippets.xsd">
<spiff createAt="compileTime" >
<invariantIntervalSet variable="query"
start="main():28 >
<interval stop=main():28" />
<generate variable="query->num_predicates" fromValue="1"
toValue="6" />
<possibleValues variable="query->executor_routine" >
<possibleValue value="SequentialScan()" />
</possibleValues>
<possibleValues variable="query->executor_command" >
<possibleValue value="SCAN_ND"
</possibleValues>

<possibleValues variable="query >predicate_listrj->operator_function" >
<possibleValue value="&EqualInt4" I>
<possibleValue value="&LessThanInt8" I>
</possibleValues>
</invariantIntervalSet>
<snippet existsFrom="SequentialScan():40"
existsTo="SequentialScan():59" >
<suggest opt="Unroll loop" at="SequentialScan():47" I>
</snippet>
</spiff>
</candidateSnippets>
1 Example 13 2 As noted above, Spiff Maker needs explicit guidance from the Ecosystem 3 Specification as to where the boundary is between compiling spiff code and just instantiating 4 a spiff instance at runtime. We assume that Ecosystem Specification specifies that that bound occurs at the `S', 'I', and `D' cases: that no spiffs can be compiled within anything 6 called by these three cases. (That emphasizes knowledge about what delays users will find 7 tolerable. Note that compiling a new spiff for a particular row might be particularly 8 beneficial for speeding up a collection of Workloads viewed as a whole, but the user may 9 still want to specify that that not be done, say because a particular workload must itself run faster with field specialization.) This is the reason why Spiff Maker includes the Ecosystem 11 Specification as an input.
12 Spiff Maker thus makes a spiff for a portion of SequentialSc an ( ) , one for each 13 value of num_columns at compile--time, for query exec utor_command always being 14 SCAN FWD. For each predicate, column Id is an arbitrary int, constant_operand is an unsigned long read from stdin, and operatoriunction is either &EqualInt4 or 16 &Les sThanInt8, with spiff 0 a non-specialized version that can handle any number of 17 num_columns. The relevant transformations are loop unrolling and constant folding. Spiff 18 Maker will produce a spiff pattern for spiff1D of 23, for num_predicates = 2, with the 19 first having column_id = 2 and operatoriunction &Equalint4 and the second 1 having column_id = 7 and operator function RessTharant8, as the following.
2 Note that query spiff s ID computation is normally associated with the particular values of 3 the spiff pattern parameter. Spiff Maker should utilize an application-specific ID generation 4 mechanism to produce the proper spiffID. In this example however, we are going to assume a computed spiffID of 23.
6 Example Algorithm for Sniff Maker 7 Spiff Maker decides only one thing: whether to allow the compiler to perform the 8 optimization, after Spiff Maker has indicated what the invariant value(s) are, or to perform 9 the optimization manually, by generating different code.
Spiff Maker then cobbles together the generated files by copying mostly verbatim 11 from the original source to the specialized source, using the file names, line numbers, and 12 column counts within the relevant PEs to determine the extent of what is copied and of what 13 is replaced say with a spiff parameter (e.g., num_columns). Hence, Spiff Maker needs to do 14 a very limited amount of parsing and unparsing, with most of its effort consisting of copying code from the appropriate place in the original source to the appropriate place in the 16 specialized source.
17 Correctness 18 Correctness dictates that the code compiles and runs and is identical in semantics to 19 the original code it replaces, while being consistent with the input information.
The following discussion provides additional examples demonstrating the creation of 21 a MiniDB Table Spiff and a MiniDB Query Spiff.
22 MiniDi Table Spiff 23 The following example demonstrates the creation of a table spiff from the invariant 24 schema- >num_columns == CONSTANT in the SequentialScan() function, shown in Example 4.
26 Invariant Finder 27 In the above example, Invariant Finder should identify the following Invariant 28 Interval Sets for the SequentialScan :
schema->num_columns variable:
29 = Invariant Interval Set #1: Starts on line 52, with 1 Invariant Intervals:
o Invariant Interval #1.1: Ends on line 114 1 Invariant Finder should also produce a VFT to show where the variable 2 SequentialScan(): : schema- >num_columns got its value:
3 = SequentialScan() :schema->num_columns got its value from 4 ExecuteQuery(): :query->schema->num_columns ExecuteQuery(): :query->schema->num_columns got its value from 6 main() :query->schema->num_columns 7 = main() :query->schema->num_columns got its value from 8 main() : :table_header->num_columns 9 = main() : :table_header->num_columns from the f read() in OpenTable().
11 Thus, the value of SequentialScan() :schema->num_columns eventually 12 comes from a call to freadO in OpenTable().
13 Invariant Checker 14 Invariant Checker would verify that once main(): :table_header->num_columns was assigned to on line 634, and that the value 16 of this variable was never changed through the particular end node(s) taken by that 17 execution of the given Workload.
18 If the Domain Assertion Deducer executed before Invariant Finder, the Invariant 19 Checker could check to ensure that the actual value was included in the Possible Values.
That might be able to reduce the scope of the Invariant Finder, by focusing that analysis on 21 particular values or variables.
22 Snippet Finder 23 Snippet Finder should determine, through an analysis of the Execution Summaries in 24 conjunction with the Cost Model, that the 'C', 'I', and 'D' cases are too expensive to create spiffs, but that the computation time within the '5' case is sufficient to suggest that that case 26 be specialized.
27 Let's start with a simple Cost Model that just states that a PE (or other equivalent 28 embodiment) that is executed less than a fixed or percentage of times will not be specialized.
29 In such a case, Snippet Finder should then infer from the domain assertion that schema- >num_columns is invariant across the body of SequentialScan() with a scope 31 of when the data file was created to when that file was removed, thus indicating that the 1 value of that variable was first written to the file when the number of columns is stored, in 2 WriteTableHeader (): 3, which is executed shortly after minidb.c:553. The value was 3 never removed from the file, but the file itself was removed at main( ) :
57. This indicates 4 that the spiff can be created at compile-time. The snippet should extend from Exec uteTa ble( ) : 20 through ExecuteTabl e ( ) :23. That is the scope of the snippet, 6 specialized on num_columns, which is determined from seeing what other statements can 7 be specialized on num_columns. (Essentially, num_columns is used infrequently, and thus 8 far away from this specialization opportunity.) However, doing an extra indirect call is 9 expensive, so Snippet Finder expands this snippet to the entire body of Exec ut eQuery( ), which has another use of this invariant on line 7. Finally, Snippet Finder should suggest loop 11 unrolling for this Candidate Snippet.
12 In this case, the spiff will have but one spiff function, indicated by the < snippet >, 13 as shown in Example 14.
Oxml version="1.0" encoding="UTF-8"?>
<candidateSnippets xmlns:xsi="..."
xsi:noNamespaceSchemaLocation="candidateSnippets.xse>
<spiff createAt="compileTime" valueRead="OpenTable():8">
<invariantIntervalSet variable="ExecuteQuery.query->schema->num.num_columns"
start="ExecuteQuery():1" value=">0"
<interval stop="ExecuteQuery():33"/>
</invariantIntervalSet>
<snippet existsFrom="WriteTableHeader():3" existTo="main():57"
replaceFunction="ExecuteTable()">
<suggest opt="Constant fold" on="num_columns"
at="ExecuteQuery():7" />
<suggest opt="Unroll loop" at="ExecuteQuery():21" I>
</snippet>
</spiff>

</candidateSnippets>
1 Example 14 2 Snippet Finder could infer from the domain assertion that data packets are created in 3 cases 'I' and 'D' of main( ) and removed is cases 'P' and 'D'. More specifically, Snippet 4 Finder infers:
9 an Interval of code from SequentialScan ( ) :16 (immediately after the 6 data packet is read in) to SequentialScan() :38 (the end of the unpacking 7 for loop), 8 = a set of Invariants: values from the row_data and the schema, from the 9 invariant analysis above, a set of possible values for each Invariant, in this case, that the value of 11 num_columns is 3, that the value of the first column is a hard-coded int, 12 and schema, that the type of the first column is int, of the second column, 13 long, and of the third column, int, an array of arbitrary char, 14 = the source location(s) where the value of each invariant was first written to a file: WriteRow() :18 and WriteRow() :25, 16 = the source location(s) where that value was removed from a file:
main() :57 17 and ExecuteDelete() :25, 18 e the source location(s) where the value is read in each time:
19 main():5equentialScan():3, e the appropriate lifetime of the Candidate Snippet: built at table definition 21 time using the query. schema invariants, with the row_values provided 22 when the spiff is instantiated at run-time, as this involves data packets that 23 might be inserted, a few queries run, then the data packet removed, so must 24 be fast, and also because the number of possibilities is very great for the row-values, and 26 unroll the for loop at SequentialScan ( ) :16 as it is terminated by the 27 value of sc hema- >num columns and includes use of row data and 28 schema values.

1 As shown in Example 15, note that the analysis combines the broader scope of the 2 table invariant with the narrower scope of the row invariant, and employs different strategies 3 for each: the former allows generating code when the table spiff is defined, whereas the 4 latter involves instantiating the spiff at runtime by providing value(s) for the row_values array. In field specializing DBMSes, the schema invariants will play a large roll in table 6 spiff instances and query spiffs, which involve invariants of successively narrower scope.
<?xml version="1.0" encoding="UTF-8"?>
<candidateSnippets xmlns:xsi="..."
xsi:noNamespaceSchemaLocation="candidateSnippets.xsd"
<spiff createAt="WriteRow():19" removeAt="main():57"
getSpiffID="SequentialScan():16" >
<invariantIntervalSet variable="CreateTable.table_header"
start="CreateTable().30" >
<interval stop=main():57" I>
<interval stop="ExecuteDelete():25" 1>
</invariantIntervalSet>
<invariantIntervalSet variable="main.values"
start="main():42" >
<interval stop="main():42" I>
</invariantIntervalSet>
<instanceParameter value="row_va1ues[1]" for="1=1" />
<snippet existsFrom="SequentialScan():16"
existsTo="SequentialScan():38" >
<suggest opt="Unroll loop" at="Sequentia1Scan():19" I>
</snippet>
</spiff>
</candidateSnippets>
7 Example 15 1 Spiff Maker 2 This is the simplest use case, as there are no instances. We explore four variants of 3 this case.
4 Variant 1: A single static implementation:
Consider the following input Candidate Snippet, corresponding to a single invariant 6 in minidb that should result in a static spiff implementation, shown in Example 16.
<?xml version="1.0" encoding="UTF-8"?>
<candidateSnippets xmlns:xsi="..."
xsi:noNamespaceSchemaLocation="candidateSnippets.xsd">
<spiff createAt="compileTime" valueRead="OpenTable():8">
<invariantIntervalSet variable="ExecuteQuery()::query->schema->num_columns"
start="ExecuteQuery():1" value="3">
<interval stop="ExecuteQuery():33"/>
</invariantIntervalSet>
<snippet existsFrom="WriteTableHeader():3" existTo="main():57"
replaceFunction="ExecuteQuery()">
<suggest opt="Constant fold" on="num_columns"
at="ExecuteQuery():7" />
<suggest opt."Unroll loop" at="ExecuteQuery():21" />
</snippet>
</spiff >
</candidateSnippets>
7 Example 16 8 = createAt="compileTime" states that this spiff should have a static 9 implementation.
= valueRead="OpenTable( ) :8" states where the variable is read from the 11 outside world, thus indicating where a spiff might be selected.

1 existsFrom="WriteTableHeader( ) :3" states where the variable is 2 written to the outside world, thus indicating where a dynamic spiff might be 3 created. For static spiffs, this can be safely ignored.
4 = exist sTo="main ( ) :57" if the "outside world" is a file, states where the file is deleted/removed, thus indicating where a dynamic sniff might be 6 garbage-collected. For static spiffs, this can be safely ignored.
7 = replaceFunction="ExecuteQuery( )" states a function that should be 8 specialized. Here there is just one, but in general there can be many.
9 * val ue="3" states for which value of the invariant variable should a spiff be generated, in this case, statically. Here there is just one, but in general there 11 can be many.
12 This input is telling Spiff Maker to make a sniff pattern, with one spiff pattern 13 function based on Exec ut eQuery ( ), and to make a static implementation that specializes 14 the variable ExecuteQuery( ) : query- >schema- >num_columns to the single literal value 3.

17 Variant 2: Specialization at a finer granularity:
18 The above example showed a spiff applied to replace an entire function 19 (ExecuteQuery( )). In fact, we can observe that only a small code segment in that function involves an invariant. Thereafter, we can convert just that small code segment into a spiff, as 21 shown in the following snippet.
22 The candidate snippet shown below (shown in Example 17) is very similar to before, 23 with the interval much smaller than the entire ExecuteQuery( ) function, rather just three 24 lines of the for loop. The replac eF unct ion attribute thus disappears.
Finally, the constant fold suggestion for line 21 is omitted, because that line is not in the interval to be 26 specialized. (We could have left it in, to be ignored.) <?xml version="1.0" encoding="UTF-8"?>
<candidateSnippets xmlns:xsi="..."
xsi:noNamespaceSchemaLocation="candidateSnippets.xsd">
<spiff createAt="compileTime" valueRead="OpenTable() :8" >

<invariantIntervalSet variable="ExecuteQuery()::query->schema->num_columns"
start="ExecuteQuery():19" value="3"
<interval stop="ExecuteQuery():22"/>
</invariantIntervalSet>
<snippet existsFrom="WriteTableHeader():3" existTo="main():57"
<suggest opt="Unroll loop" at="ExecuteQuery():21" 1>
</snippet>
<spiff>
</candidateSnippets>
1 Example 17 2 Variant 3: Employing fixed array of implementations:
3 In our particular example, we decide to identify the spiff implementations with a 4 one-byte integer. Therefore, we can have a total of 255 implementations from the same spiff pattern, with the num_columns variable acting as the spiff-pattern parameter, varying from 6 1 to 255 (0 is chosen to represent the invalid value). So the candidate snippet shown below 7 in Example 42 doesn't have a value just of 3, but all values from 1-255 (that is, the 8 fromValue and toValue attributes of the invariantIntervalSet element). We also 9 revert back to a full function replacement.
<?xml version="1.0" encoding="UTF-8"?>
<candidateSnippets xmlns:xsi="..."
xsi:noNamespaceSchemaLocation="candidateSnippets.xsd"
<spiff createAt="compileTime" valueRead="OpenTable():8"
<invariantIntervalSet variable="ExecuteQuery()::query->schema->num_columns"
start="ExecuteQuery():1" fromValue="1" toValue="255"
<interval stop="ExecuteQuery():33"/>
</invariantIntervalSet>

<snippet existsFrom="WriteTableHeader():3" existTo="main():57"
replaceFunction="ExecuteQuery()">
<suggest opt="Constant fold" on="num_columns"
at="ExecuteQuery():7" />
<suggest opt="Unroll loop" at="ExecuteQuery():21" I>
</snippet>
</spiff>
</candidateSnippet>
1 Example 18 3 Variant 4: Dynamic spiff:
4 In reality, each column in a table can be of a particular datatype.
Assuming there are eight datatypes (int2, int4, char, varchar, etc), a static table spiff for a three-column 6 table requires 3^8 possible implementations. Hence, a dynamic table spiff is more suitable in 7 this scenario.
8 The candidate snippet given below (see Example 19) states this with the createAt 9 attribute, which here specifies where in the application the spiff is created, that is, within the CreateTable( ) function (the createAt attribute, which in the previous example was 11 compileTime) as well as where the spiff is to be instantiated, that is, within the 12 OpenTable() function (the instantiateAt attribute). There are no fromValue or 13 toValue attributes in the invariantIntervalSet element, as the num_columns value 14 is supplied when the snippet is instantiated. The one other important difference from the previous example is the additional optimization suggestion to constant fold on 16 column definitions.

<?xml version="1.0" encoding="UTF-8"?>
<candidateSnippets xmlns:xsi="..."
xsi:noNamespaceSchemalocation="candidateSnippets.xse>
<spiff createAt="createTable():31" instantiateAt="OpenTable():8" >

<invariantIntervalSet variable="ExecuteQuery.query->schema->num.num_columns"
start="ExecuteQuery():1" value=">0"
<interval stop="ExecuteQuery():33"/>
</invariantIntervalSet>
<invariantIntervalSet variable="ExecuteQuery.query->schema->column_definitions"
start="ExecuteQuery():1" value="W>
<interval stop="ExecuteQuery():33"/>
</invariantIntervalSet>
<snippet existsFrom="WriteTableHeader():3" existTo="main():57"
replaceFunction="ExecuteTable()">
<suggest opt="Constant fold" on="num_columns"
at="ExecuteQuery():7" />
<suggest opt="Constant fold" on="column_definitions"
at="ExecuteQuery():7" I>
<suggest opt="Unroll loop" at="ExecuteQuery():21" />
</snippet>
</spiff>
</candidateSnippets>
1 Example 19 2 Unlike creating a static spiff, a dynamic spiff is created at runtime by inserting a call 3 to compile the spiff into CreateTable(), for a table spiff.
4 The Design of Various Types of Spiffs (Here we use examples from the open-source Postgres DBMS.) 6 Predicate Query Spiff 7 This spiff is utilized by evaluating both regular predicates within queries, such as 8 o_orderdate >= date ' 19940801 1, and join predicates, such as o_orderkey 9 l_orderkey.

1 These predicates are evaluated by the ExecQual ( ) function (in Postgres).
2 Specifically, predicates are normally represented in a linked list.
ExecQual( ) iterates 3 through this list and invoke particular evaluation function corresponding to each individual 4 predicate. The code excerpt presented in Example 20 (from PG 9.3 stock, src/backend/executor/execQual.c:5125) shows such logic.
bool ExecQual(List *qual, ExprContext *econtext, bool resultForNull) = =
foreach(1, qual) //line 4157 ExprState *clause - (ExprState *) lfirst(1);
Datum expr_value;
bool isNull;
expr_value = ExecEvalExpr(clause, econtext, StisNull, NULL);
= = =

6 Example 20 '7 The per-predicate evaluation function is stored within the clause variable. For each 8 predicate, in the form of a > b, there are three components, operand #1, an operator, and 9 operand #2. In Postgxes, the operator is evaluated by function Exec EvalOper. This function (see Example 21) essentially performs a look up according to the type of operator 11 and fetches the address of the actual type-specific comparison function.
Exec EvalOper( ) 12 also requires that the operands to be stored in another linked list. In many cases, the length 13 of this list is two. Below is an example to specialize this function on those cases.
Datum ExecEvalOper(FuncExprState *fcache, ExprContext *econtext, bool *isNull, ExprDoneCond *isDone) . . .
/* Initialize function lookup info */
init_fcache(op->opfuncid, op->inputcollid, = = =

fcache->xprstate.evalfunc =
(ExprStateEvalFunc)ExecMakeFunctionResultNoSets;
return ExecMakeFunctionResultNoSets(fcache, econtext, isNull, isDone);
1 Example 21 2 Note that an optimization to Exec EvalOper() is that it is executed just once to 3 perform the comparison function look up. It then stores into xprstate.
evaifunc a 4 different function. It also calls that function once to do the predicate.
Subsequent evaluations of the operator is done by ExecMakeFunctionResultNoSet s ( ) (for scalar predicates 6 considered within our current specialization scope).
7 ExecMakeFunctionResultNoSets() then iterates through the list of operands by 8 calling the argument-extraction function for each operand.
9 ExecEvalExpr is a macro, defined in src/include/executor/executor.h:72 as:
#define ExecEvalExpr(expr, econtext, isNull, isDone) \
11 ( (*(expr)>evalfunc) (expr, econtext, isNull, isDone)) 12 So if the operand is a constant, the ExecEvalConst() would be called, and finally, 13 invokes the comparison function.
14 The bottlenecks observed in predicate evaluation are, first, the loop that iterates through just two elements in the operand list, and second, the extraction of individual 16 operands. Specifically, we observe that for a regular predicate, one operand is normally a 17 table column and the other operand is a constant. In such a case, the constant's value (or 18 address) can be directly "stored" in the code rather than having to invoke multiple functions 19 to fetch it. In addition, the original implementation requires multiple function calls to extract the column ID for the table-originated operand. Similarly, this column ID can be stored 21 directly into the specialized code.
22 For a join predicate, both operands are non-constant. The origin of an operand can be 23 of one of three types, that of INNER_VAR (/), OUTER_VAR (0), and scant uple (S).
24 The origin of the operand is also an invariant given a query. By knowing this invariant, we can further simplify the routine that extracts the actual operand's value.
Note that although 26 theoretically, there are 9 possible combinations for the origins of the two operands, in 27 reality, only the following combinations are allowed.

Operand 1 Operand 2 1 Hashjoin Query Spiff 2 In function ExecHashJoin( ), defined in file src/backend/executor/nodeHashjoin.c.
3 The variable node->js.jointype is invariant for a given query. Depending on the query, 4 it will take on one value from the set {JOIN ANTI, JOIN_SEMI, JOIN_LEFT, JOIN INNER}.
6 In the same file and function, the variable List *joinqual is also invariant for a 7 given query.
8 Hashjoin Query Spiff eliminates entire branches in the code, thereby reducing the 9 number of if statements and, more importantly, the size of the code.
The analysis has to be able to handle complex data structures involving pointers and 11 heap-allocated structs. For example, to eliminate if statements in the body of the for loop in 12 ExecHashJoin(), we have to be able to reason about expressions like the following 13 (Example 22).
*define 1-17_FILL_INNER(hjstate) ((hjstate)->hj_NullOuterTupleSlot != NULL) #define HJ_FILL_OUTER(hjstate) ((histate)->hj_NullinnerTupleSlot != NULL) #define TupIsNull(slot) ((slot) == NULL II (slot)>tts_isempty) if (HJ_FILL_INNER(node)) = = =
else if (1-0_FILL_OUTER(node) II
(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!node->hj_OuterNotEmpty)) if (hashtable->totalTuples == 0 && !1-D_FILL_OUTER(node)) - = =
if (TupIsNull(outerTupleSlot)) = = =
if (batchno != hashtable->curbatch &&

node->hj_CurSkewBucketNo == INVALID_SKEW_BUCKET_NO) 1 Example 22 2 Page Spiff 3 A page spiff utilizes invariant(s) within a disk/memory page with which the DBMS
4 manages its storage of data. Often, such invariants could include the number of rows stored on the page, the free space remained, and whether the page is empty or full.
In postgres' 6 page-scan routines, there are additional invariants, such as the scan direction and scan mode 7 (pageat at ime).
8 More interestingly, page spiffs can enable more aggressive optimizations. For 9 instance, a page spiff can reorganize the data layout once the page is read into memory to optimize data locality. In addition, once data layout is changed, instead of following the 11 existing function-call sequence in further process, the page spiff can invoke these calls in a 12 block-at-a-time manner, thereby improving instruction locality as well.
13 A page spiff is possible to specialize a long sequence of calls that eventually access 14 data, passing up the data a ways, where it is possible to specialize out a lot of code in the called functions.
16 The chief benefit of the page spiff is to inline the called functions to produce a single 17 specialized function that can fit in the instruction cache. Once this transformation is done, 18 three other mutually exclusive transformations become available.
19 1. Eager invocation of the GetColumnsToLongs ( ) speccode routine: as soon as the packed tuple is extracted from the page, convert to a unpacked t upl et ables lot, then 21 store it in the array manipulated by the speccode routine.
22 2. Eager partial unpacking: have the code that calls the specialized code compute the 23 maximum column that will be needed, and only unpack the columns up to there.
24 3. Lazy unpacking: do multiple unpacking operations, wherever the original code called 26 GetColumnsToLong().
27 A variant of this uses the selectivity of the select to decide. If the selectivity is high, 28 meaning that only a few rows will be referenced, use lazy unpacking before applying the 29 predicate.

1 In general, it is best to place the call to GetColumn s To Lo ng ( ) such that the 2 execution can maximize instruction cache locality.
3 Aggregate Query Spiff 4 Aggregate spiffs are designed to improve the efficiency of the SUM and AVG
aggregate functions. In particular, we have found that in evaluating aggregate functions with 6 the numeric datatype, Postgres incurs a significant overhead in performing memory 7 allocation and deallocation. In particular, aggregate spiffs avoid such memory-management 8 overheads.
9 In Postgres, a numeric type is represented by a byte string, with each digits stored in an array of NumericDigit. This representation allows very fine precision control but 11 sacrifices performance by needing to perform essentially string-based arithmetic operations.
12 In the generic implementation, the reason of having to perform per-row based 13 memory allocation is that for each input row, the number of digits present in the value for 14 each row can differ. Especially in evaluating a * b, the resulting value's range can go far beyond that of the input values. Nevertheless, there is a constant 16 (NUMERIC _MAX PRECISION) in Postgres that defines the max number of digits that can be 17 supported for a numeric value. The aggregate spiffs utilize this value to slab-allocate a spiff 18 data section, which is then reused by computing the corresponding aggregate function across 19 all the input rows, thereby eliminating per-row memory allocation.
Note that the evaluation of an aggregate function consists of two steps. For instance, 21 given an aggregate function SUM( a + b), the first step is to evaluate the result of 22 expression a + b. The second step is then to accumulate the values of a + b for all the 23 input rows. In PostgreSQL both a + b and the SUM( ) function are evaluated using the 24 numeric_add( ) function. This function takes two inputs. In the case of a + b, the two inputs are a and b, respectively. In the case of computing SUM( x), the second input is the x, 26 which can essentially come from a scanned row. The first input is a transition value, which 27 is the current sum of the rows that have been processed up to this point.
28 Evaluating SUN( ) 29 According to numer c_add( ), the two inputs are added and the resulting value is copied into the returning res variable allocated by make_result Q. res is then returned 31 back to adv a nce_t ransit ion_funct ion( ) within nodeAgg.c, which copies this 1 returned value into pergroupstate->transValue and then frees the returned value. The 2 next time advance_transition_f unction ( ) is executed to process the next row, the 3 transValue is copied to the first input value of numeric_add( ) via the following 4 snippet.
fcinfo->arg[0] = pergroupstate->transValue;
6 fcinfo->argnull[0] = pergroupstate->transValueIsNull;
7 This logic indicates that t ran sVal ue can in fact be shared without being freed 8 across all the rows. Therefore, for the data section of the EvaluateNumericAdd spiff, at 9 the beginning of the aggregate evaluation, the necessary variables are allocated, namely agg_temp_values->result_value and agg_temp_values-> result_arg, by using 11 AllocateAggTempValues( ). (Note that these two variables represent the same value but 12 Postgres requires two such variables as the return value and as a temporary computation 13 argument, respectively.) 14 Evaluating Expression As discussed earlier, another usage of numer c_add( ) is in computing arithmetic 16 expression, such as a + b. In this case, the variable that stores the evaluation result of the 17 expression are reused, which is previously allocated by make_result().
This variable is 18 added to the spiff's data section as agg_temp_values->expr_result_arg.
19 Unlike in the case of evaluating SUMO, where the first input comes directly from agg_temp_values->result_value within the spiffs data section, both inputs in 21 evaluating a + b are regular variables that are required to be obtained using existing 22 Postiges' implementation. In fact, in evaluating a + b, numeric_add ( ) is invoked from 23 ExecEvalOper() within execQual.c. So similar to the predicate spiff, a spiff 24 (EvaluateAggregateExpression) is created that specializes the ExecMakeFunctionResultNoSets( ) function. This spiff then invokes the expression-26 evaluation version of the EvaluateNumericAdd spiff.
27 In addition to +, an expression can include other operations such as *, and I. The 28 functions that evaluate these operations are also specialized in the same fashion as 29 numeric_add().
In summary of the EvaluateNumericAdd spiff, the following invariants are 31 considered.

1 1). The caller/execution path where numer ic_add ( ) is invoked. This can be either 2 from evaluating an expression in execQual.c or from evaluating the SUM( ) function in 3 nodeAgg.c.
4 2). In evaluating expressions, the result value's memory location can be invariant.
3). In evaluating SUM( ) , both the result value's memory location and the first input's 6 memory location can be invariant. In addition, these two variables can even share the same 7 memory location.
8 4). Sharing a common memory segment across all the rows is permitted by the 9 constant that bounds the maximal precision of the numeric datatype.
String Matching Spiff 11 Say we have a C function, match, which matches a string x with another string 12 pattern (containing wildcards and other special characters), y. If we know stringy in 13 advance of query execution (maybe it is a query constant), then we can create a specialized 14 function for matching arbitrary strings to this particular string pattern.
One approach for specialization would be to first create the following query 16 specialization code (speccodes):
17 = one each for constant string of length 1-32 18 = one for the '%' query string character 19 We could then string various combinations of these speccodes together to make a specialized function for matching a string to y. For example, say that we have the pattern 21 "%abc%defgr, and we want to create a specialized function for matching arbitrary strings 22 to it. We would string together the following speccodes:
23 = a % speccode 24 = a 3-character speccode, to match "abc"
= a % speccode (could be the same as the first one) 26 = a 4-character speccode, to match "defg"
27 = an ending % speccode 28 Each of these speccodes would assume that there are more characters in the string left to 29 match after it has completed. Once one of the speccodes has finished matching, it would pass the rest of the string on to the next speccode in the sequence to continue the matching 31 process.

1 Matching the constant portions of the string would be accomplished using a 2 combination of longlong, long, short, and char combos.
3 Given an arbitrary query string, it would be easy to instantiate a sequence of query spiff 4 function pointers, each one of which except the last invoked the next stage by a query spiff invocation using the spiff id stored as a local variable.
6 The following (Example 23) shows an example of how this would be implemented 7 for the string "%abc%defg%" (using pseudocode).

9 /* For matching "abc" */
length3match(char *s, char *t) {
11 match first 3 characters of s and t 12 remove first 3 characters from s and t 13 return 14 }
16 /* For matching "defg" */
17 length4match(char *s, char *t) {
18 match first 4 characters of s and t 19 remove first 4 characters from s and t return 21 }

23 /* For matching an arbitrary "%" */
24 match%(char *s) match n characters in s to 'V
26 remove first n characters from s 27 return 28 }

/* For matching a "%" at the end of a pattern */
31 match%end(char *s) 1 match any number o-F characters 2 return 3 }
4 Example 23 Once we have these speccode-routines created, we will construct a sequence of functions 6 calls as an array to match a string to this pattern. The array would look like in Example 24.
function pointers match%
-length3match _ .
mat c h%
length4match mat ch%end 7 Example 24 8 These functions would then be called in sequence to match the string.
Constant 9 portions of length greater than 32 could be broken up into segments, so a string of length 65 would require three instantiated speccodes, of 32, 32, and 1 character.
11 More generally, we have a method with a subset of arguments that are invariant.
12 These invariants render some of the if statements, including in recursive calls and within 13 loops, to be deterministic. We unroll this sequence through a series of speccodes that invoke 14 one another. So this appears to be a general transformation that works on loops and on recursive and non-recursive calls.
16 Since the actual sequence of speccodes that get invoked isn't known until runtime 17 (when the actual pattern being matched is available), we could have the spiff instantiator fill 18 in an array of function pointers that specify the sequence of speccodes to invoke.
19 Per-Query Spiff Sequencing Per-Query Spiff Sequencing uses a meta-spiff to traverse an interpreted data 21 structure and hot swapping to convert existing speccode into something similar that would 22 be emitted by a compiler. In some embodiments, the hot-swapping mechanism is used to I turn switch/case blocks into specialized code that can be stitched together at runtime, 2 according to the relationships among various cases. Specifically, when a case is followed 3 by another particular case during execution. Hot-swapping will replace the calls to the 4 branch-based dispatcher to direct jumps to the target branch. This applies to general dispatcher and the interpretive execution model. Instead of interpreting query plans and 6 invoke corresponding plan-node specific functions, all dispatcher calls can be replaced by 7 direct jumps to the child plan node, given a plan tree.
8 Specialized Code Storage When the specialization is invoked, a specialized code (speccode) is generated and may be stored in various locations along the field specialization process. For example, 11 speccodes may involve invariants both from the oil field data 220 and the oil field simulator 12 230. Speccodes may involve invariants both from the config params 210 and the oil field simulator 230. In some embodiments, speccode may be stored in the Linux operating 14 system 230 may involve invariants from the simulator and oil field data.
In some embodiments, speccode may be stored in an external router or an external cloud service.
16 The speccode stored within the simulator may be originating from the oil field data 17 and from the simulator, and may be identified through basic field specialization. Other 18 spiffs are stored with the operating system, router, and cloud, specializing code found in the 19 indicated applications. In some embodiments, speccodes may flow from where they are stored to where they may be invoked (the application that provided the specialization 21 candidate from which they were subsequently specialized). For example, the oil field data 22 may store speccodes to be invoked at the external the router. In some embodiments, 23 speccode identifiers can reside with data or with applications and can also be included in 24 communications with subsequent applications, indicating the relevant speccode to (later) invoke.
26 FIG. 3 is an illustration of Field specialization for elaboration a paradigm of computer 27 science with an exemplary embodiment provided by this disclosure. The diagram comprises 28 four quadrants 310, 320, 330 and 340 for scenarios of data representing as data, code 29 representing as data, data representing as code, and code representing as code, respectively.
In the early stages of computer architecture, from the Babbage machine through the 31 1930's, data was differentiated from code. The data was what was manipulated and the program code was the instructions for how to manipulate the data to effect a computation.

This is represented in quadrant 310 in Figure 3 as data represented in binary format in a computer memory or storage device, that is, data stored as data, and source code represented 4 in some other way (e.g., patch cords), that is, code represented as code.
In the 1940's, John von Neumann proposed a revolutionary architecture that stored 6 the program in machine code in the computer's memory as numbers, mixing code and data.

(Indeed, the code can be manipulated as data, and even changed during the running of the program.) This architecture is represented in quadrant 320, with code (machine instructions) 9 represented as data.
In the 1960's there were some initial forays into combining code, in the form of a Lisp function, with data, say the value of one of the arguments, to produce a Lisp continuation, which is a Lisp function (code) paired with the value of that parameter, which 13 is just a function with one less argument. This is in a very particular way that data are 14 stored/encapsulated in code, as represented in quadrant 330.
In the 1980's the Postscript language was invented. This was code that when executed would create an image. Postscript is produced by a formatter, taking a document such as a Microsoft Word file, which is data, and converting to a program, again, code as data, as represented in quadrant 320. The Postscript file produced from the Microsoft Word file is not an image to be directly printed, but instructions for drawing each letter of the document, so that that program could be executed, for example, within a postscript printer or 21 by a postscript conversion program, to produce a bit-mapped image of the document.

Field specialization takes this idea further. Field specialization takes the values of invariants, that is, data, and uses these values to create a specialized code version of a portion of an application, such as a DBMS, which is code that can be executed.
So a relation speccode is the result of specializing DBMS code using the schema of a relation (data). A

tuple speccode is the result of using the data values within a tuple (row of a table). An 0/S

speccode is a specialization of a snippet of an operating system based on particular data 28 values of particular invariants within that snippet; ditto for router speccodes.

This data-represented-as-code (as represented in quadrant 330) can be created in one application from a snippet in that application or another application, passed around between applications, and invoked by the target application when appropriate. The field specialization technology provides the means for identifying when such speccode are effective in increasing performance, when they should be created, with which invariants should they be specialized upon, how they can be communicated across applications, and 4 when they should be invoked.
The implication is that for any coherent region within a data file, it is possible to ascertain the invariant values within that region, follow those values into areas of the application code, and then make speccodes out of those areas, then associate those speccodes back to their regions. This perspective thus focuses on the originating data, rather 9 than starting with the code and specializing it.
It should be emphasized that the above-described embodiments of the present disclosure, particularly, any "preferred" embodiments, are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) of the disclosure without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present disclosure and protected 17 by the following claims.

Claims

What is claimed is:

1. A computer-implemented method for improving the performance of a computer program code, comprising:
identifying, based on a Program Representation (PR), that is, an abstract syntax tree or other embodiment of the computer program code, invariant intervals for variables in the computer program code;
deducing, based on the PR and an Ecosystem Specification for the computer program, program interactions within the computer program;
deducing, based on the PR, the identified invariant intervals for variables in the computer program code and the deduced program interactions, the domain assertions;
identifying, based on the invariant intervals for variables in the computer program code, the PR, one or more execution summaries associated with the computer program, the deduced program interactions and the deduced domain assertions, one or more candidate snippets;
generating specialized computer program code, based on the one or more candidate snippets; and modifying the computer program code based on the generated specialized computer program code; and concealing the specialized computer program code.

2. The computer-implemented method of claim 1 characterized by one or both of the following features:
(a) wherein the identified invariant intervals span multiple executions;
and (b) wherein the identified invariant intervals comprises at least set of invariant intervals for a particular variable, where all invariant intervals in the set share the same starting node.

3. 'The computer-implemented method of claim 1 or claim 2, wherein each of the one or more candidate snippets comprises (a) an interval of code identified by the PR, or (b) a set of invariants and a set of possible values for each variant.

4. The computer-implemented method of any of claims 1-3, wherein each of the one or more candidate snippets comprises an appropriate lifetime of the Candidate Snippet, and wherein each of the one or more candidate snippets preferably comprises suggested optimizations to be employed within the appropriate lifetime of the Candidate Snippet.

5. The computer-implemented method of any of claims 1-4, wherein generating specialized computer program code comprises (a) inserting codes in places within the computer program to create the specialized computer program code, to invoke the specialized computer program code and to destroy the specialized computer program code, or (b) creating a specialized function for matching arbitrary strings to a given string pattern, or comprises using a meta-specializer to traverse an interpreted data structure and hot swapping to convert existing specialized computer program code, or involves query, or comprises eliminating branches in the computer program code and thus reducing the size of the computer program code, or comprises utilizing a numeric value to slab-allocate a specializer-in-the-field (Spiff) data section, wherein the Spiff data section is then reused by computing a corresponding aggregate function across all input rows within the computer program code to eliminate memory allocation per-row, wherein the numeric value is defined by max number of digits that can be supported per row, or comprises utilizing invariant within a disk or memory page with which the computer program is stored, reorganize the data layout once the page is read, and optimize data locality.

6. The computer-implemented method of any one of claims 1-5, wherein the specialized computer program code is created at a run time and invoked at a later time., and optionally, further comprising determining whether any violations of the identified invariable intervals occur in a given execution.

7. A system configured for improving the performance of a computer program, comprising:
an Invariant Finder identifying invariant intervals for variables in the computer program, based on a Program Representation (PR) of the computer program;
an Interaction Deducer deducing program interactions within the computer program based on the PR and an Ecosystem Specification for the computer program,;
a Domain Assertion Deducer deducing domain assertions based on the PR, the identified invariant intervals for variables in the computer program and the deduced program interactions;

a Snippet Finder identifying one or more candidate snippets based on the invariant intervals for variables in the computer program, the PR, one or more execution summaries associated with the computer program, the deduced program interactions and the deduced domain assertions;
a specializer-in-the-field (Spiff) Maker generating specialized computer program code based on the one or more candidate snippets; and a specialized source receiving the generated specialized computer program code to modify the computer program code based on the generated specialized computer program code.

8. The system of claim 7, wherein the Invariant Finder performs static analysis on the PR to identify invariant intervals.

9. The system of claim 7 or claim 8, wherein the identified invariant intervals comprises at least set of invariant intervals for a particular variable, where all invariant intervals in the set share the same starting node.

10. The system of any one of claims 7-9, further comprising an invariant check determining whether any violations of the identified invariable intervals occur in a given execution.

11. The system of anyone of claims 7-10, wherein each of the one or more candidate snippets comprises a set of invariants and a set of possible values for each variant.

12. The system of anyone of claims 7-11, wherein each of the one or more candidate snippets comprises an appropriate lifetime of the Candidate Snippet, and wherein each of the one or more candidate snippets preferably comprises suggested optimizations to be employed within the appropriate lifetime of the Candidate Snippet.

13. The system of anyone of claims 7-12, wherein the Spiff Maker is further configured to insert codes in places within the computer program utilized to create the specialized computer program code, to invoke the specialized computer program code and to destroy the specialized computer program code, or to create a specialized function for matching arbitrary strings to a given string pattern, or to use a meta-specializer to traverse an interpreted data structure and hot swapping to convert existing specialized computer program code, or to combine query invariants in effect during a query with schema and row invariants, or to eliminate branches in the computer program code and thus reduce the size of the computer program code, or to utilize a numeric value to slab-allocate a specializer-in-the-field (Spiff) data section, wherein the Spiff data section is then reused by computing a corresponding aggregate function across all input rows within the computer program code to eliminate memory allocation per-row, wherein the numeric value is defined by max number of digits that can be supported per row, or to utilize invariant within a disk or memory page with which the computer program is stored, reorganize the data layout once the page is read, and optimize data locality.

14. A non-transitory computer readable medium comprising computer executable instructions that, when executed by a processor of a computing device, cause the computing device to:
determine a variable in a computer program whose value is invariant within an identified invariant interval via static analysis on a Program Representation (PR) of the computer program;
produce codes in places within the computer program to create a specialized computer program code, to invoke the specialized computer program code and to destroy the specialized computer program code, the a specialized computer program code being created based at least on the determined variable; and modify at least a part of the computer program with the specialized computer program code when the specialized computer program is invoked.

15. The non-transitory computer readable medium of claim 14, wherein producing the specialized computer program code is further based on at least one of domain-specific knowledge associated with the computer program and extra-source knowledge.