CN117581215A

CN117581215A - Colorless root implementation in Z garbage collector

Info

Publication number: CN117581215A
Application number: CN202280046281.7A
Authority: CN
Inventors: E·厄斯特伦德; P·利登; S·M·R·卡尔松
Original assignee: Oracle International Corp
Current assignee: Oracle International Corp
Priority date: 2021-05-19
Filing date: 2022-05-13
Publication date: 2024-02-20

Abstract

A request is received from a mutating thread to load a first reference to a first object from heap memory onto a call stack of an application thread (902). In response to receiving the request, the system retrieves a first reference from heap memory (904). The system performs a bit-by-bit operation that (a) removes one or more bits representing a first garbage collection state, and (b) generates a second reference from the first reference. Based on a particular bit of the one or more bits removed from the first reference by the shift operation, the system determines whether to perform a set of garbage collection operations on the first reference to bring the first reference into a good state (908). A second reference is stored to the call stack without any indication of any of the plurality of garbage collection states.

Description

Colorless root implementation in Z garbage collector

Incorporated by reference; disclaimer of disclaimer

The following applications are hereby incorporated by reference: application No.17/303,634 filed on 6/3 of 2021; application No.63/190,617 filed on 19/5/2021; application No.63/190,621 filed on day 19 of 5 of 2021; application No.63/190,625 filed on day 19 of 5 of 2021. The applicant hereby removes any disclaimer of claim scope from the parent application or its prosecution history and informs the United States Patent and Trademark Office (USPTO), which claims in this application may be broader than any claim in the parent application.

Technical Field

The present disclosure relates to generation (genetic) garbage collectors. In particular, the present disclosure relates to shift-based colorless root (color roots) implementations in garbage collectors.

Background

The compiler converts source code written according to specifications intended for the convenience of a programmer into machine code (also referred to as "native (active) code" or "object code"). The machine code may be executed directly by a physical machine environment. Additionally or alternatively, the compiler converts the source code into an intermediate representation (also referred to as "virtual machine code/instructions"), such as bytecode, which may be executed by virtual machines capable of running on top of various physical machine environments. Virtual machine instructions may be executed by a virtual machine in a more direct and efficient manner than source code. Converting source code into virtual machine instructions includes mapping source code functions to virtual machine functions according to specifications, which utilize underlying resources (such as data structures) of the virtual machine. Typically, functionality that a programmer presents in simple terms via source code is translated into more complex steps that map more directly to instruction sets supported by the underlying hardware in which the virtual machine resides.

The virtual machine executes the application and/or program by executing an intermediate representation of the source code, such as bytecode. An interpreter of the virtual machine converts the intermediate representation into machine code. When an application is executed, a certain amount of memory (also referred to as "heap memory") is allocated for objects created by the program. The garbage collection system may be used to automatically reclaim memory locations occupied by objects that are no longer used by the application. The garbage collection system eliminates the need for a programmer to explicitly specify the objects to be released. The generational garbage collection scheme is based on empirical observations that most objects are only used for a short period of time. In the sub-generation garbage collection, two or more allocation areas (generations) are designated and kept separate based on the age of the object contained therein. New objects are created in regularly collected "young" generations, and when a generation is full, objects referenced by one or more objects still stored in the older generation region are copied (i.e., "promoted") to the next oldest generation. Sometimes a complete scan is performed.

The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Thus, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.

Drawings

In the various figures of the drawings, embodiments are illustrated by way of example and not by way of limitation. It should be noted that references to "an" embodiment in this disclosure are not necessarily to the same embodiment, and they mean at least one. In the drawings:

FIG. 1 illustrates an example computing architecture in which the techniques described herein may be practiced.

FIG. 2 is a block diagram illustrating one embodiment of a computer system suitable for implementing the methods and features described herein.

FIG. 3 illustrates an example virtual machine memory layout in block diagram form in accordance with an embodiment.

Fig. 4 illustrates an example frame in block diagram form in accordance with an embodiment.

FIG. 5 illustrates an execution engine and heap memory of a virtual machine, according to an embodiment.

FIG. 6 illustrates heap references and dereferencing references, in accordance with an embodiment.

FIG. 7 illustrates a reference load barrier according to an embodiment.

FIG. 8 illustrates a reference write barrier according to an embodiment.

FIG. 9 illustrates a set of operations for loading heap references by an application thread, according to an embodiment.

FIG. 10 illustrates a set of operations for writing heap references by an application thread, in accordance with an embodiment.

FIG. 11 illustrates a system in accordance with one or more embodiments.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding. One or more embodiments may be practiced without these specific details. Features described in one embodiment may be combined with features described in a different embodiment. In some instances, well known structures and devices are described with reference to block diagram form in order to avoid unnecessarily obscuring the present invention.

1. General overview

2. Architecture overview

2.1 Example class File Structure

2.2 Example virtual machine architecture

2.3 Loading, linking and initializing

3. Garbage collection

4. Load barrier and write barrier

5. Loading heap references by application threads

6. Writing heap references by application threads

7. Other aspects; expansion of

8. Hardware overview

1. General overview

The virtual machine executes the application and/or program by executing an intermediate representation of the source code, such as bytecode. An interpreter of the virtual machine converts the intermediate representation into machine code. When an application is executed, a certain memory (also referred to as "heap memory") is allocated for objects created by the program. The garbage collection system may be used to automatically reclaim memory locations occupied by objects that are no longer used by the application. The heap memory may be divided into multiple generations for the purpose of storing objects. In particular, heap memory may include portions designated as "young generations" for storing newly created objects, and portions designated as "old generations" for storing older objects. In embodiments, the multi-generation garbage collector may collect garbage by traversing the entire heap memory or by traversing only a portion of the heap memory. For example, the garbage collector may traverse only the portion of heap memory designated as the young generation.

One or more embodiments include performing garbage collection based on garbage collection states (also referred to as "colors") that are (a) stored with heap references, but (b) not stored with references that may be used to access underlying objects as part of application execution. A set of Garbage Collection (GC) states is used to track progress of GC operations with respect to heap references. The heap reference includes an indication of the GC state associated with the heap reference. Based on the GC state associated with the heap reference and the current phase of the current GC cycle, certain GC operations are selected to process the heap reference. Meanwhile, references (also referred to as "dereferensable references") that may be used to access the underlying object as part of application execution do not include any indication of any of the set of GC states. Such dereferencing references include, for example, references stored in a call stack.

One or more embodiments include implementing a reference load barrier when loading references from heap memory to a call stack. An application thread, which may run concurrently with the GC thread, requests that references be loaded from heap memory onto the call stack. As described above, the heap reference includes a "color" that indicates the GC state at the time the heap reference is stored. Performing a set of operations on references from heap memory, (a) determining whether the GC state indicated by the color is "good" with respect to (e.g., matches at least a portion of) the current phase of the current GC cycle, and (b) modifying the references by removing the color from the references. If the GC state indicated by the color is not good, then a set of GC operations are performed to change the heap reference from the current state to a good GC state, and the heap reference is updated to indicate the good GC state. Thereafter, the modified reference is stored on the call stack. References on the call stack that point to the same object as the heap reference do not include any indication of any of the GC states.

One or more embodiments include implementing a reference write barrier when writing references to heap memory. An application thread, which may run concurrently with the GC thread, requests that references be written onto heap memory. The reference does not necessarily contain any indication of any of the GC states prior to writing onto heap memory. A good GC state of the current phase relative to the current GC period is determined. After adding the indication of the good GC state as the current GC state of the reference, the reference is written onto heap memory. References on heap memory that point to the same object as the reference to be written include an indication of the current GC state of the reference.

One or more embodiments described in the specification and/or recited in the claims may not be included in this general overview section.

2. Architecture overview

FIG. 1 illustrates an example architecture in which the techniques described herein may be practiced. The software and/or hardware components described with respect to the example architecture may be omitted or associated with a different set of functions than those described herein. According to one or more embodiments, software and/or hardware components not described herein may be used within an environment. Accordingly, the example environment should not be considered as limiting the scope of any claims.

As shown in fig. 1, the computing architecture 100 includes source code files 101 that are compiled by a compiler 102 into class files 103 that represent programs to be executed. The class files 103 are then loaded and executed by an execution platform 112 that includes a runtime environment 113, an operating system 111, and one or more Application Programming Interfaces (APIs) 110 that enable communication between the runtime environment 113 and the operating system. The runtime environment 113 includes a virtual machine 104 that includes various components such as a memory manager 105 (which may include a garbage collector), a class file validator 106 to check the validity of class files 103, a class loader 107 to locate and build in-memory (in-memory) representations of classes, an interpreter 108 to execute the virtual machine 104 code, and a just-in-time (JIT) compiler 109 to produce optimized machine-level code.

In an embodiment, the computing architecture 100 includes a source code file 101 containing code that has been written in a particular programming language (such as Java, C, C++, C#, ruby, perl, etc.). Thus, the source code file 101 follows a particular set of grammatical and/or semantic rules for the associated language. For example, code written in Java complies with Java language specifications. However, as the specification is updated and revised over time, the source code file 101 may be associated with a version number that indicates the revised version of the specification that the source code file 101 follows. The exact programming language used to write source code file 101 is generally not critical.

In various embodiments, compiler 102 converts source code written according to specifications intended to facilitate a programmer into machine code or object code that is directly executable by a particular machine environment, or into intermediate representations ("virtual machine code/instructions") (such as bytecodes) that are executable by virtual machines 104 that are capable of running on a variety of particular machine environments. Virtual machine instructions may be executed by virtual machine 104 in a more direct and efficient manner than source code. Converting the source code into virtual machine instructions includes mapping the source code functions from the language to virtual machine functions that utilize underlying resources (such as data structures). Typically, functionality that a programmer presents in simple terms via source code is translated into more complex steps that map more directly to instruction sets supported by the underlying hardware in which virtual machine 104 resides.

Generally, a program is executed as a compiled program or an interpreted program. When a program is compiled, the code is globally transformed from a first language to a second language prior to execution. Because the work of transforming code is performed in advance, compiled code tends to have excellent runtime performance. In addition, since the transformation occurs globally prior to execution, techniques such as constant folding (constant folding), dead code cancellation (dead code elimination), inlining, and the like may be used to analyze and optimize the code. However, depending on the program being executed, the start-up time (start-up time) may be long. Furthermore, inserting new code would require taking the program offline (offly), recompilation, and re-execution. For many dynamic languages (such as Java) designed to allow code to be inserted during program execution, a purely compiled approach may not be appropriate. When a program is interpreted, the code of the program is read row by row and converted to machine-level instructions while the program is executing. Thus, the program has a short start-up time (execution may begin almost immediately), but runtime performance is degraded by performing the transition on the fly. Furthermore, since each instruction is analyzed separately, many optimizations that rely on a more global analysis of the program cannot be performed.

In some embodiments, virtual machine 104 includes an interpreter 108 and a JIT compiler 109 (or components that implement both aspects), and programs are executed using a combination of interpretation and compilation techniques. For example, virtual machine 104 may initially begin by interpreting virtual machine instructions representing a program via interpreter 108 while tracking statistical information related to program behavior, such as how frequently different code portions or code blocks are executed by virtual machine 104. Once a code block exceeds a threshold (becomes "hot"), the virtual machine 104 invokes the JIT compiler 109 to perform analysis of the block and generate optimized machine-level instructions that are used in place of the "hot" code block for future execution. Since programs tend to spend a small portion of the time executing the entire code, compiling only the "hot" portion of the program may provide similar performance as fully compiling the code, but without a startup penalty. Furthermore, while optimization analysis is constrained to "hot" blocks being replaced, there is still much greater optimization potential than converting each instruction individually. There are many variations in the above examples, such as hierarchical compilation.

To provide a clear example, source code file 101 has been illustrated as a "top level" representation of a program to be executed by execution platform 112. Although the computing architecture 100 depicts the source code file 101 as a "top-level" program representation, in other embodiments the source code file 101 may be an intermediate representation received via a "higher-level" compiler that processes code files of different languages into the language of the source code file 101. Some examples in the following disclosure assume that source code file 101 complies with a class-based object-oriented programming language. However, this is not a requirement to utilize the features described herein.

In an embodiment, compiler 102 receives source code file 101 as input and converts source code file 101 into class file 103 in a format desired by virtual machine 104. For example, in the context of a JVM, the Java virtual machine specification defines the particular class file format that class file 103 is expected to follow. In some embodiments, class file 103 contains virtual machine instructions that have been converted from source code file 101. However, in other embodiments, class file 103 may also contain other structures, such as tables identifying constant values and/or metadata associated with various structures (classes, fields, methods, etc.).

The following discussion assumes that each of the class files 103 represents a respective "class" defined in the source code file 101 (or dynamically generated by the compiler 102/virtual machine 104). However, the foregoing assumptions are not stringent requirements and will depend on the implementation of virtual machine 104. Thus, the techniques described herein may still be performed regardless of the exact format of class file 103. In some embodiments, class files 103 are partitioned into one or more "libraries" or "packages" each of which includes a collection of classes that provide related functionality. For example, a library may contain one or more class files that implement input/output (I/O) operations, math tools, cryptography, graphics utilities, and the like. In addition, some classes (or fields/methods in these classes) may include access restrictions that restrict the use of the class within a particular class/library/package or to classes with appropriate rights.

2.1 example class File Structure

FIG. 2 illustrates an example structure of a class file 200 in block diagram form in accordance with an embodiment. To provide a clear example, the remainder of this disclosure assumes that class file 103 of computing architecture 100 follows the structure of example class file 200 described in this section. However, in a practical environment, the structure of class file 200 will depend on the implementation of virtual machine 104. Further, one or more features discussed herein may modify the structure of the class file 200, for example, to add additional structure types. Thus, the exact structure of class file 200 is not critical to the techniques described herein. For purposes of section 2.1, a "class" or "current class" refers to the class represented by class file 200.

In FIG. 2, class file 200 includes a constant table 201, a field structure 208, class metadata 207, and a method structure 209. In an embodiment, constant table 201 is a data structure that serves as a symbol table for a class, among other functions. For example, the constant table 201 may store data, such as type, scope, content, and/or location, related to various identifiers used in the source code file 101. The constant table 201 has entries for a value structure 202 (representing constant values of types int, long, double, float, byte, string, etc.), a class information structure 203, a name and type information structure 204, a field reference structure 205, and a method reference structure 206 derived from the source code file 101 by the compiler 102. In an embodiment, the constant table 201 is implemented as an array mapping index i to structure j. However, the exact implementation of the constant table 201 is not critical.

In some embodiments, the entries of the constant table 201 include structures that index other constant table 201 entries. For example, an entry for one of the value structures 202 representing a string may hold a tag identifying its "type" as a string and an index to one or more other value structures 202 in the constant table 201 that store char, byte, or int values representing the ASCII characters of the string.

In an embodiment, the field reference structure 205 of the constant table 201 maintains an index in the constant table 201 that points to one of the class information structures 203 that represent the class that defines the field, and an index in the constant table 201 that points to one of the name and type information structures 204 that provides the name and descriptor of the field. The method reference structure 206 of the constant table 201 holds an index in the constant table 201 pointing to one of the class information structures 203 representing the class defining the method, and holds an index in the constant table 201 pointing to one of the name and type information structures 204 providing the name and descriptor of the method. Class information structure 203 maintains an index in constant table 201 that points to one of the value structures 202 that holds the name of the associated class.

The name and type information structure 204 maintains an index into the constant table 201 that points to one of the value structures 202 storing the names of the fields/methods, and an index into the constant table 201 that points to one of the value structures 202 storing the descriptors.

In an embodiment, class metadata 207 includes metadata for the class, such as version number(s), number of entries in the constant pool, number of fields, number of methods, access flag (whether the class is public, private, final, abstract, etc.), index to one of class information structures 203 of constant table 201 identifying the current class, index to one of class information structures 203 of constant table 201 identifying superclass (if any), etc.

In an embodiment, field structure 208 represents a set of structures that identify various fields of a class. The field structure 208 stores, for each field of a class, an accessor flag (whether the field is static, public, private, final, etc.), an index in the constant table 201 to one of the value structures 202 holding the name of the field, and an index in the constant table 201 to one of the value structures 202 holding the descriptor of the field.

In an embodiment, method structure 209 represents a set of structures that identify various methods of a class. The method structure 209 stores, for each method of the class, an accessor flag of the method (e.g., whether the method is static, public, private, synchronized, etc.), an index of one of the value structures 202 that holds the name of the method in the constant table 201, an index of one of the value structures 202 that holds the descriptor of the method in the constant table 201, and a virtual machine instruction corresponding to the body of the method defined in the source code file 101.

In an embodiment, the descriptor indicates the type of field or method. For example, the descriptor may be implemented as a string following a specific syntax. Although the exact syntax is not important, several examples will be described below.

In an example where the descriptor represents the type of field, the descriptor identifies the type of data held by the field. In an embodiment, a field may hold a base type, object, or array. When a field holds a base type, a descriptor is a string that identifies the base type (e.g., "B" =byte, "C" =char, "D" =double, "F" =float, "I" =int, "J" =long int, etc.). When a field holds an object, the descriptor is a string (e.g., "L ClassName") that identifies the class name of the object. In this case, "L" indicates a reference, and thus "L ClassName" represents a reference to an object of the class ClassName. When the field is an array, the descriptor identifies the type held by the array. For example, "[ B" indicates an array of bytes, where "[" indicates an array and "B" indicates that an array holds a base byte type. However, since arrays may be nested, descriptors of arrays may also indicate nesting. For example, "[ [ L className ] indicates an array in which each index holds an array of objects holding class ClassName. In some embodiments, the ClassName is fully defined and includes the simple name of the class and the pathname of the class. For example, a ClassName may indicate where a file is stored in a package, library, or file system hosting that class of files 200.

In the case of a method, the descriptor identifies the parameters of the method and the return type of the method. For example, the method descriptor may follow the general form "({ ParameterDescriptor }) Return Descriptor", where { ParameterDescriptor } is a list of field descriptors representing parameters, and Return Descriptor is a field descriptor identifying a return type. For example, the string "V" may be used to represent the void return type. Thus, the method defined as "Object m (int I, doubled, thread) {.. } in the source code file 101 matches the descriptor" (ID LThread) L Object ".

In an embodiment, the virtual machine instructions maintained in method structure 209 include operations that reference entries of constant table 201. Using Java as an example, consider the following classes:

in the above example, java methods add12and13 are defined in class A, without parameters, and return integers (integers). The body of method add12and13 invokes static method addTwo of class B, which takes constant values 12and13 as parameters and returns the result. Thus, in the constant table 201, the compiler 102 includes, among other entries, a method reference structure corresponding to a call to method b.addtwo. In Java, calls to methods are compiled down into invoke commands in the bytecode of the JVM (in this case invoke static because addTwo is a static method of class B). The invoke command is provided with an index in the constant table 201 corresponding to a method reference structure that identifies the class defining addwo "B", the name "addwo" of addwo, and the descriptor "(I I) I" of addwo. For example, assuming the above method reference is stored at index 4, the bytecode instruction may appear as "invokestatic #4".

Since the constant table 201 uses structures carrying identifying information to symbolically reference classes, methods, and fields rather than directly reference memory locations, the entries of the constant table 201 are referred to as "symbologies". One reason that symbolic references are used for class file 103 is because in some embodiments compiler 102 does not know how and where a class will be stored once it is loaded into runtime environment 113. After the referenced class (and associated structure) has been loaded into the runtime environment and allocated a particular memory location, the symbolic referenced runtime representation will ultimately be resolved to the actual memory address by the virtual machine 104, as will be described in section 2.3.

2.2 example virtual machine architecture

FIG. 3 illustrates an example virtual machine memory layout 300 in block diagram form, according to an embodiment. To provide a clear example, the remainder of the discussion will assume that virtual machine 104 follows virtual machine memory layout 300 depicted in FIG. 3. Further, while components of virtual machine memory layout 300 may be referred to as memory "regions," it is not required that memory regions be contiguous.

In the example illustrated in fig. 3, virtual machine memory layout 300 is divided into shared region 301 and thread region 307. The shared area 301 represents an area in memory in which structures shared between various threads executing on the virtual machine 104 are stored. The shared area 301 includes a heap 302 and a per-class area 303. In an embodiment, heap 302 represents a region of runtime data from which memory for class instances and arrays is allocated. In an embodiment, the per-class region 303 represents a storage region in which data related to a single class is stored. In an embodiment, per-class region 303 includes, for each loaded class: a runtime constant pool 304 representing data from the constant table 201 of the class, field and method data 306 (e.g., to maintain static fields of the class), and method code 305 representing virtual machine instructions for methods of the class.

Thread area 307 represents a memory area that stores structures specific to each thread. In FIG. 3, thread region 307 includes thread structure 308 and thread structure 311, which represent per-thread structures utilized by different threads. To provide a clear example, the thread region 307 depicted in FIG. 3 assumes that two threads are executing on the virtual machine 104. However, in a practical environment, virtual machine 104 may execute any arbitrary number of threads, with the number of thread structures scaled accordingly.

In an embodiment, thread structure 308 includes a program counter 309 and a virtual machine stack 310. Similarly, thread structure 311 includes a program counter 312 and a virtual machine stack 313. In an embodiment, program counter 309 and program counter 312 store the current address of virtual machine instructions being executed by their respective threads.

Thus, as a thread steps through instructions, the program counter is updated to maintain an index to the current instruction. In an embodiment, virtual machine stack 310 and virtual machine stack 313 each store a frame (frame) for their respective threads that holds local variables and partial results and is also used for method calls and returns.

In an embodiment, a frame is a data structure that is used to store data and partial results, return values for methods, and perform dynamic linking. A new frame is created each time a method is called. When the method of generating a frame is made complete, the frame is destroyed. Thus, when a thread performs a method call, virtual machine 104 generates a new frame and pushes the frame onto the virtual machine stack associated with the thread.

When the method call is completed, the virtual machine 104 passes the result of the method call back to the previous frame and pops the current frame out of the stack. In an embodiment, one frame is active at any point for a given thread. This active frame is referred to as a current frame, so that a method of generating the current frame is referred to as a current method, and a class to which the current method belongs is referred to as a current class.

Fig. 4 illustrates an example frame 400 in block diagram form, according to an embodiment. To provide a clear example, the remainder of the discussion will assume that the frames of virtual machine stack 310 and virtual machine stack 313 adhere to the structure of frame 400.

In an embodiment, frame 400 includes local variables 401, operand stacks 402, and a runtime constant pool reference table 403. In an embodiment, the local variables 401 are represented as an array of variables, each holding a value, e.g., boolean, byte, char, short, int, float or reference value. In addition, some value types (such as long or double) may be represented by more than one entry in the array. The local variables 401 are used to pass parameters on method calls and store partial results. For example, when frame 400 is generated in response to a calling method, parameters may be stored in predefined locations within local variables 401, such as indexes 1-N corresponding to the first through N-th parameters in the call.

In an embodiment, when virtual machine 104 creates frame 400, operand stack 402 defaults to null. The virtual machine 104 then supplies instructions from the method code 305 of the current method to load the constants or values from the local variables 401 onto the operand stack 402. Other instructions fetch operands from the operand stack 402, operate on them, and push results back onto the operand stack 402. In addition, operand stack 402 is used to prepare parameters to be passed to a method and to receive method results. For example, parameters of the method being called may be pushed onto operand stack 402 before making a call to the method. Virtual machine 104 then generates a new frame for the method call, where operands on operand stack 402 of the previous frame are popped and loaded into local variables 401 of the new frame. When the invoked method terminates, a new frame pops out of the virtual machine stack and the return value is pushed onto the operand stack 402 of the previous frame.

In an embodiment, runtime constant pool reference table 403 contains references to runtime constant pool 304 for the current class. Runtime constant pool reference table 403 is used to support parsing. Parsing is the process of translating symbol references in the constant pool 304 into specific memory addresses, loading classes as needed to parse undefined symbols and translate variable accesses into appropriate offsets in the memory structure associated with the runtime locations of these variables.

2.3 Loading, linking and initializing

In an embodiment, virtual machine 104 dynamically loads, links, and initializes classes. Loading is the process of finding a class with a particular name and creating a representation from the class's associated class file 200 within the memory of the runtime environment 113. For example, a runtime constant pool 304, method code 305, and field and method data 306 are created for the class within the per-class region 303 of the virtual machine memory layout 300. Linking is the process of taking a memory representation of a class and combining it with the runtime state of virtual machine 104 so that the method of the class can be performed. Initialization is the process of executing a class construct function (construct) to set the fields of the class and the start state of method data 306 and/or to create class instances for the initialized class on heap 302.

The following are examples of loading, linking, and initialization techniques that may be implemented by virtual machine 104. However, in many embodiments, these steps may be staggered such that an initial class is loaded, then during linking, a second class is loaded to resolve symbolic references found in the first class, which in turn results in a third class being loaded, and so on. Thus, the progress through the loading, linking, and initialization phases may vary from class to class. In addition, some embodiments may delay ("lazy" execution) one or more functions of the loading, linking, and initializing process until such time as the class is actually needed. For example, resolution of a method reference may be delayed until a virtual machine instruction that invokes the method is executed. Thus, the exact time to perform the steps will vary greatly between implementations for each class.

To begin the loading process, the virtual machine 104 is started by calling a class loader 107 that loads the original class. The technique of specifying the initial class will vary from embodiment to embodiment. For example, one technique may be to have the virtual machine 104 accept command line parameters specifying an initial class at startup.

To load a class, the class loader 107 parses the class file 200 corresponding to the class and determines whether the class file 200 is well-structured (meets the syntactic expectations of the virtual machine 104). If not, class loader 107 generates an error. For example, in Java, an error may be generated in the form of an exception, where the exception is thrown to an exception handler for processing. Otherwise, class loader 107 generates a memory representation of the class by assigning a runtime constant pool 304, method code 305, and field and method data 306 for the class within per-class region 303.

In some embodiments, when class loader 107 loads a class, class loader 107 also recursively loads the superclass of the loaded class. For example, the virtual machine 104 may ensure that superclasses of a particular class are loaded, linked, and/or initialized before continuing the loading, linking, and initializing process of the particular class.

During linking, the virtual machine 104 validates the class, prepares the class, and performs resolution of symbolic references defined in the constant volume pool 304 at runtime of the class.

To verify the class, the virtual machine 104 checks whether the memory representation of the class is structurally correct. For example, the virtual machine 104 may check that each class other than the generic class Object has a superclass, check that the final class has no subclasses and the final method is not covered, check that constant pool entries are consistent with each other, check that the current class has proper access rights to the class/field/structure referenced in the constant pool 304, check that the virtual machine 104 code of the method will not cause unexpected behavior (e.g., ensure that jump instructions will not send the virtual machine 104 beyond the end scope of the method), and so forth. The exact check performed during verification depends on the implementation of virtual machine 104. In some cases, verification may result in additional classes being loaded, but it is not necessarily required that those classes be linked as well before continuing. For example, assume class A contains a reference to the static field of class B. During validation, virtual machine 104 may examine class B to ensure that the referenced static field is actually present, which may result in the loading of class B, but not necessarily in the linking or initialization of class B. However, in some embodiments, certain verification checks may be delayed until a later stage, such as being checked during resolution of the symbolic reference. For example, some embodiments may delay checking access permissions to symbolic references until the references are resolved.

To prepare the class, virtual machine 104 initializes static fields located within the fields of the class and method data 306 to default values. In some cases, setting the static field to a default value may be different from running the constructor of the class. For example, the validation process may zero out static fields or set them to values that the constructor expects these fields to have during initialization.

During parsing, the virtual machine 104 dynamically determines a specific memory address from among the symbolic references included in the runtime constant pool 304 of classes. To parse the symbolic reference, virtual machine 104 loads the class identified in the symbolic reference (if not already loaded) with class loader 107. Once loaded, virtual machine 104 knows the memory location of the referenced class and its fields/methods within per-class region 303. Virtual machine 104 then replaces the symbolic reference with a reference to the specific memory location of the referenced class, field, or method. In an embodiment, virtual machine 104 caches the resolution for reuse in the event that the same class/name/descriptor is encountered while virtual machine 104 processes another class. For example, in some cases, class a and class B may call the same method of class C. Thus, when parsing class a is performed, the results may be cached and reused during parsing of the same symbolic references in class B to reduce overhead.

In some embodiments, the step of parsing the symbolic reference during linking is optional. For example, embodiments may perform symbol parsing in a "lazy" manner, delaying the step of parsing until virtual machine instructions requiring the referenced class/method/field are executed.

During initialization, virtual machine 104 performs a constructor of the class to set the start state of the class. For example, the initialization may initialize the fields of the class and method data 306 and generate/initialize any class instances created by the constructors on the heap 302. For example, the class file 200 for a class may specify that a particular method is a constructor for setting a start state. Thus, during initialization, virtual machine 104 executes instructions of the constructor.

In some embodiments, virtual machine 104 performs parsing of field and method references by initially checking whether the field/method is defined in the referenced class. Otherwise, virtual machine 104 recursively searches the superclass of the referenced class for the referenced field/method until the field/method is located or the top-level superclass is reached (in which case an error is generated).

3. Garbage collection

FIG. 5 illustrates an execution engine and heap memory of a virtual machine, according to an embodiment. As shown in fig. 5, system 500 includes an execution engine 502 and a heap 530. The system 500 may include more or fewer components than are shown in fig. 5. The components shown in fig. 5 may be local or remote from each other.

In one or more embodiments, heap 530 represents a runtime data area from which memory for class instances and arrays is allocated. An example of a heap 530 is described above as heap 302 in fig. 3.

Heap 530 stores objects 534a-534d created during execution of an application. The objects stored in heap 530 may be generic objects, arrays of objects, or objects of another type. The common object is a class instance. Class instances are explicitly created by class instance creation expressions. An object array is a container object that holds a fixed number of values of a single type. An object array is a specific set of common objects.

Heap 530 stores surviving objects 534b, 534d (indicated by the dotted line pattern) and unused objects 534a, 534c (also referred to as "dead objects", indicated by the blank pattern). An unused object is an object that is no longer used by any application. A surviving object is an object that is still being used by at least one application. An object is still being used by an application if (a) it is pointed to by the root reference or (b) it can be tracked from another object pointed to by the root reference. A first object is "trackable" from a second object if a reference to the first object is contained in the second object.

The sample code may include the following:

the application thread 508a executing the sample code creates an object temp in the heap 530. The object temp is a Person type and includes two fields. Since field age is an integer, the portion of heap 530 allocated to temp directly stores the value "3" of field age. Since the field name is a character string, the portion of heap 530 allocated to temp does not directly store the value of the name field; instead, the portion of heap 530 allocated to temp stores a reference to another String type object. The String object stores the value "Sean". The String object is said to be "trackable" from the Person object.

In one or more embodiments, the execution engine 502 includes one or more threads configured to perform various operations. As shown, for example, the execution engine 502 includes Garbage Collection (GC) threads 506a-b and application threads 508a-b.

In one or more embodiments, the application threads 508a-b are configured to perform operations of one or more applications. The application threads 508a-b create objects during runtime that are stored on the heap 530. The application threads 508a-b may also be referred to as "mutation (mutator)", as the application threads 508a-b may mutate the heap 530 (during concurrent phases of and/or between GC cycles).

In one or more embodiments, the GC threads 506a-b are configured to perform garbage collection. The GC threads 506a-b may iteratively perform GC cycles based on scheduling and/or event triggers, such as when a threshold allocation of a heap (or region thereof) is reached. The GC cycle includes a set of GC operations that are used to reclaim memory locations in the heap that are occupied by unused objects.

In an embodiment, multiple GC threads 504a-b may perform GC operations in parallel. The plurality of GC threads 506a-b operating in parallel may be referred to as a "parallel collector".

In an embodiment, the GC threads 506a-b may perform at least some GC operations concurrently with execution of the application threads 508 a-b. The GC threads 504a-b that operate concurrently with the application threads 508a-b may be referred to as "concurrency collectors" or "partial concurrency collectors".

In an embodiment, the GC threads 506a-b may perform generational garbage collection. The stack is divided into different regions. The first region (which may be referred to as a "young generation space") stores objects that have not met the criteria for promotion from the first region to the second region; the second region (which may be referred to as an "aged space") stores objects that have met the criteria for lifting from the first region to the second region. For example, a surviving object is promoted from younger to older space when it survives at least a threshold number of GC cycles.

Various GC processing for performing garbage collection achieves different memory efficiency, time efficiency, and/or resource efficiency. In an embodiment, different GC processing may be performed for different heap regions. As an example, a heap may include young generation space and old generation space. One type of GC processing may be performed on young generation spaces. Different types of GC processing may be performed for the senior citizen space. Examples of different GC treatments are described below.

As a first example, the replication collector (copying collector) involves at least two separately defined address spaces of the heap, referred to as a "from-space" and a "to-space". The replication collectors identify surviving objects stored within the region defined as the source space. The replication collector replicates the surviving object to another region defined as the target space. After identifying and copying all surviving objects, the region defined as source space is reclaimed. The new memory allocation may begin from the first location of the original source space.

Replication may be accomplished in at least three distinct areas within the heap: one Eden space (Eden space) and two survivor spaces (surviving space) S1 and S2. The object is initially allocated in the Eden garden space. When the Eden space is full, the GC period is triggered. Surviving objects are copied from the Eden garden space to one of the survivors space, e.g. S1. In the next GC cycle, surviving objects in the meadow garden space are replicated to another survivor space, which will be S2. In addition, surviving objects in S1 are also replicated to S2.

As another example, a mark-and-sweep (GC) collector divides GC operation into at least two phases: a marking stage and a cleaning stage. During the labeling phase, the labeling and sweeping collector labels each surviving object with a "surviving" bit. For example, the surviving bits may be bits within the object header of the surviving object. During the purge phase, the tag and purge collectors traverse the heap to identify all untagged blocks of the contiguous memory address space. The marking and sweeping collectors link the unmarked blocks together to form an organized free list. These unlabeled blocks are recycled. These free lists are used to perform new memory allocations. The new object may be stored in a memory block identified from the free list.

The marking and cleaning collector may be implemented as a parallel collector. Additionally or alternatively, the marking and sweeping collector may be implemented as a concurrent collector. Example phases within the GC period of the concurrency marker and sweep collector include:

stage 1: identifying objects referenced by root references (which are not concurrent with executing applications)

Stage 2: marking objects reachable from objects referenced by the root reference (which may be concurrent)

Stage 3: identifying objects that have been modified as part of program execution during phase 2 (which may be concurrent)

Stage 4: re-marking the objects identified in stage 3 (which are not concurrent)

Stage 5: sweeping the heap to get a free list and reclaiming memory (this may be concurrent)

As another example, the compression collector attempts to compress the reclaimed memory region. The heap is partitioned into a set of equal-sized heap regions, each region being a contiguous range of virtual memory. The compression collector performs a concurrent global tagging phase to determine the survivability (liveness) of the objects in the entire heap. After the marking phase is completed, the compression collector identifies the mostly empty regions. The compression collector first collects these areas, which typically creates a large amount of free space. The compression collector concentrates its collection and compression activities in the area of the heap that may be filled with recyclable objects (i.e., garbage). The compaction collector copies surviving objects from one or more regions of the heap to a single region on the heap and compacts and frees memory in the process. Such a purge (evaluation) may be performed in parallel on multiple processors to reduce the suspension time and increase throughput.

Example phases within the GC period of the concurrent compression collector include:

Stage 4: re-marking the objects identified at stage 3 (which are not concurrent)

Stage 5: copying surviving objects from a source region to a destination region, thereby reclaiming memory space of the source region (which is not concurrent)

As another example, a barrier collector marker is loaded and surviving objects are compressed, but references to relocated objects are remapped lazy. The load barrier collector relies on embedding a "color" within the references stored on the heap. The color represents the GC state and tracks the progress of the GC operation relative to the reference. The color is captured by metadata stored in some bits of the reference.

At each moment in time, all GC threads 506a-b agree on what color is a "good color" or a "good GC state". The GC threads 506a-b that load the reference from the heap 530 to the call stack first apply a check to determine if the current color of the reference is good. Similarly, the application thread 508a-b that loads the reference from the heap 530 to the call stack first applies a check to determine if the current color of the reference is good. This check may be referred to as a "load barrier". A good color reference will hit the fast path, which will not produce additional work. Otherwise, the reference will hit the slow path. The slow path involves some GC operations that bring references from the current GC state into a good GC state. The slots (slots) where the references reside in the heap 530 are updated with good color aliases to avoid subsequent hits to the slow path (updating to good color may also be referred to as "self-repair").

For example, a stale reference (a reference to an object that has moved concurrently during compression, meaning that the address may point to an outdated copy of the object, or another object, or nothing at all) ensures that there is no good color. An application thread attempting to load the reference from the heap first executes the load barrier. By loading the barrier, the reference is identified as stale (not a good color). The reference is thus updated to point to the new location of the object and is associated with a good color. References with updated addresses and good color are stored in the heap. References with updated addresses may also be returned to the application thread. However, the reference returned to the application thread does not necessarily contain any color.

Additional and/or alternative types of GC processing may be used than those described above. Other types of GC processing may also rely on the "color" of the reference, or metadata related to garbage collection stored within the reference.

In an embodiment, the color is stored with the heap reference, but not with the dereferencing reference. The term "heap reference" refers to a reference stored on heap 530. The term "dereferensable reference" refers to a reference that an execution engine uses to access the value of the object to which the reference is directed. Obtaining the value of the object to which the reference points is referred to as "dereferencing" the reference. The GC threads 506a-b attempting to dereference a reference stored on the heap 530 first load the reference from the heap 530 to the call stack of the GC threads 506 a-b. The application threads 508a-b attempting to dereference a reference stored on the heap 530 first load the reference from the heap 530 to the call stack of the application threads 508 a-b. (e.g., the application thread loads the reference into a local variable 401 within the frame 400 of the call stack, as described above with reference to FIG. 4). Heap references and/or dereferencing references are generally referred to herein as "references".

Referring to fig. 6, fig. 6 illustrates heap references and dereferencing references in accordance with an embodiment. The reference may include any number of bits, depending on the computing environment. For example, in the Intel x86-64 machine, there are 64 bits referenced.

In an embodiment, the dereferencing reference 600 includes an unaddressable portion 602 and an addressable portion 604. The addressable portion 604 defines the maximum address space that the reference 600 can reach. Depending on the hardware system on which the application is executing, the non-addressable portion 602 may be required to adhere to a canonical form before the reference 600 is dereferenced. If such a requirement is imposed, then the hardware system (such as the processor) is attempting to perform the taskAn error may be generated when incompatible dereferencing references are dereferenced. Thus, the non-addressable portion 602 of the reference 600 cannot be used to store any GC-related metadata, such as GC state. For example, in the Intel x86-64 machine, the addressable portion of the reference has 48 bits and the non-addressable portion has 16 bits. Based on hardware-imposed constraints, the reference can reach 2 at most ⁴⁸ A unique address. The canonical form requires that the non-addressable portion be a sign extension 610 of the value stored in the addressable portion (i.e., the higher order bits 48-63 must be copies of the value stored in bit 47).

As shown, the addressable portion 604 includes an address 606 and optionally other bits 608. Address 606 refers to the address of the object to which reference 600 points. Other bits 608 may be unused. Alternatively, other bits 608 may store metadata, which may be, but need not be, related to garbage collection.

As described above, the dereferencing reference 600 includes a reference stored on a call stack. Additionally or alternatively, the dereferencing reference 600 includes a reference embedded within a compiled method stored on a code cache and/or other memory location. Compiled methods are methods that have been converted from a high-level language (such as bytecodes) to a low-level language (such as machine code). The application thread may directly access the compiled method within the code cache or other memory location to execute the compiled method. As an example, the compiled method may be generated by JIT compiler 109 of fig. 1. As another example, the compiled method may be generated by another component of the virtual machine.

In an embodiment, heap reference 650 includes transient color bits 652, address bits 606, and optionally other bits 608. Transient color 652 represents a GC state that tracks progress of GC operations relative to reference 650. Color 652 is "transient" in that color 652 need not remain with the reference when the reference is loaded from heap 530 to the call stack. Other bits 608 may be unused. Alternatively, other bits 608 may store metadata, which may be, but need not be, related to garbage collection. In an embodiment, transient color 652 is stored in the lowest (rightmost) bit of heap reference 650. For example, transient color 652 may be two bytes in length and stored in bits 0-15 of heap reference 650.

In an embodiment, transient color 652 includes one or more remapped bits 654. In an embodiment, remap bit 654 provides an indication of the current relocation stage of a generation in the GC for each generation of GC. In an embodiment, the GC includes two generations (e.g., young and old generations), and the remapped bits include a number of bits sufficient to describe the current relocation phase of both young and old generations. For example, the remap bits may include 4 bits. In an embodiment, remapped bits 654 are stored in the highest portion of transient color 652. For example, where transient color 652 is stored in bits 0-15 of heap reference 650, remapped bits 654 may constitute bits 12-15 of heap reference 654.

Transient color 652 may optionally include additional color bits, including one or more marker bits 656, one or more memory set bits 658, and one or more other bits 660. In an embodiment, remap bit 654 may represent a relocation phase of the GC. In a multi-generation GC, remap bits 654 may represent the relocation phase of each generation of GC. The remap bits are described in more detail below.

In an embodiment, the marker bit 656 may represent a marker parity for the GC. In a multi-generation GC, the marker bits 656 may include representations of the marker parity for different generations of GC. For example, in a GC that includes a young generation and an old generation, the marker bit 656 may include two bits for representing marker parity in the young generation and two bits for representing marker parity in the old generation. In another example embodiment, the flag bits 656 may include a first set of bits that represent flag parity for young generation GC operations, and a second set of flag bits that represent parity for full heap GC operations (which may include only old generation, or both old and young generation).

In an embodiment, the memory set bit 658 may represent the memory set phase of the GC. As a specific example, the memory set bit may be two bits, with a single bit set representing a phase of the memory set. Memory set bits indicate potential references in the aged to young generations.

In embodiments, other bits 660 may be used to represent other characteristics of the GC state. Alternatively, other bits 660 may not be used. In some embodiments, the number of other bits 660 may be determined such that the number of bits in the transient color 652 is an integer of bytes (e.g., the number of bits may be divisible by 8). For example, the number of bits in the transient color 652 may be 8 bits or 16 bits. In yet another embodiment, the transient color 652 may collectively represent a set of different GC states. Transient color 652 may represent GC status used in additional and/or alternative types of GC processing.

In an embodiment, the GC period may include multiple phases. In some embodiments, the GC system may include a separate GC cycle for each generation specified in the heap. For example, a GC system may include a young generation cycle and an old generation cycle. The young age GC cycle may include the following phases: mark start, concurrent mark, relocation start, concurrent relocation. In some embodiments, the older GC period is symmetrical with the younger GC period, and may include the same phases. In some embodiments, each phase executes concurrently, meaning that one or more application threads 508a, 508b may continue to execute during that phase. In other embodiments, one or more of the phases (e.g., marking start, repositioning start) may be non-concurrent. All application threads 508a-b must pause during the non-concurrent phase (also referred to as "stop world pause" or "STW pause"). In some embodiments, a GC cycle (e.g., a young generation GC cycle or an old generation GC cycle) begins when an object on a heap allocated to a particular generation exceeds a storage threshold, or after a certain period of time has elapsed without a GC cycle.

The following is a detailed discussion of the stages. Additional and/or alternative operations in addition to those discussed below may also be performed in each stage.

The marking starts: in the marker start phase, the GC updates one or more constants (e.g., "good color") by updating the marker parity and/or the memory set parity for the young generation. During the beginning of the markup, the GC can capture a snapshot of the memory set data structure.

And (5) concurrency marking: the GC threads 506a-b perform traversal of the object graph to identify and tag all surviving objects. The GC threads track through the transitive closure of the heap 530, thereby truncating any traversals that are made outside of the young generation. If a stale reference is found in heap 530 during this process, the reference is updated with the current address of the object to which the reference refers. The reference in heap 530 is also updated to indicate a good color.

Optionally, per-page survivability information (total number and total size of surviving objects on each memory page) is recorded. The survivability information may be used to select pages for purging.

And (3) marking: the GC threads 506a-b mark any queued objects and track the delivery closures of the queued objects and confirm that the marking is complete.

Relocation starts: during the beginning of relocation, the GC updates one or more constants (e.g., "good color") by updating at least the remap bits. In an embodiment, the GC threads 506a-b select an empty region as the target space. In another embodiment, additional and/or alternative methods may be used to select a target space for the relocated object.

Concurrent repositioning: the marked source space object may be relocated to the selected target space (in particular cases, compression may be used in place). Each object that is moved and contains a stale pointer to the current generation of relocation is added to the memory set. This helps ensure that the pointers are subsequently remapped.

4. Load barrier and write barrier

In one or more embodiments, the GC period includes one or more concurrent phases. During the concurrency phase, one or more application threads may execute concurrently with one or more GC threads. When an application thread attempts to load a reference from a heap to a call stack, the application thread may execute a reference load barrier. When an application thread attempts to write a reference to a heap, the application thread may execute a reference write barrier.

FIG. 7 illustrates a reference load barrier according to an embodiment. As shown, heap 730 includes addresses 00000008, 00000016, … …, 00000048, 00000049, 00000050. The call stack local variable 732 includes registers r1, r2, r3. In an example, the reference includes 32 bits. The color of the heap reference may be indicated by bits 0-15. For example, a color may include 4 remapped bits (e.g., bits 12-15) for indicating the relocation phases of the young and old ages; 4 marker bits (e.g., bits 8-11) for indicating marker parity in the young and old ages; two memory set bits (e.g., bits 6-7) for indicating memory set parity in the GC; and six other bits (bits 0-5) that may not be used or may store other metadata.

Regarding the remapped bits, these bits may use encoding such that exactly one of the four remapped bits is set, with the set one bit indicating a relocation phase of both the young generation GC operation and the full heap GC operation (which may include only the old generation, or both the old and young generation). In particular, the four remapped bits may be represented as four-digit (four-digit) binary numbers. For remapped bits, a value of 0001 may indicate that the full heap relocation is in an even phase and the young generation relocation is in an even phase; a value of 0010 may indicate that the full heap relocation is in an even phase and the young generation relocation is in an odd phase; a value of 0100 may indicate that the full heap relocation is in the odd phase and the young generation relocation is in the even phase; a value of 1000 may indicate that the full heap relocation is in an odd phase and that the young generation relocation is in an odd phase. Thus, four possible values that include exactly one set bit represent each possible combination of relocation phases within the older and younger generations.

The GC may also set a shift value 1 higher than the position of a specific bit set in the current good color among the remapped bits. This ensures that the particular bit is the last bit shifted out of the address. For example, assuming that the remapped bits are bits 12-15, the shift value may be set to a value between 13 and 16, where bit 13 corresponds to the set bit of bit 12 for the remapped bit, value 14 corresponds to the set bit of bit 13 for the remapped bit, value 15 corresponds to the set bit of bit 14 for the remapped bit, and value 16 corresponds to the set bit of bit 15 for the remapped bit. In an embodiment, the shift value changes at least at the beginning of each new GC relocation phase, and may be set using, for example, a compiled method entry barrier fix-up (compiled method entry barrier patching).

In an embodiment, the referenced address portion may overlap with the color bits, beginning immediately after the set bits of the remapped bits. Thus, the address portion of the reference may start anywhere between bit 13 and bit 16, depending on where the set bit is in the remapped bit. However, any bit contained within the overlap is set to zero. Thus, this approach requires that the three least significant bits of each address be zero.

The sample code may include the following:

based on the code line Person temp1 = new Person (), the application thread creates a new object in heap 730 and reference temp1 references the new object. The object (referenced by temp 1) belongs to the Person type and contains a name field of the String type. The object (referenced by temp 1) is stored at address "00000008" within heap 730. The name field of the object (referenced by temp 1) is stored at address "00000016" within heap 730. The name field is filled with a reference 705. Reference 705 includes color 706 and points to address "0048". Thus, address "00000048" contains the value of the name of the object (referenced by temp 1), and this value is "TOM".

Based on the code line String temp 2=temp1. Name, the application thread attempts to load the reference 705 in the name field of the object referenced by temp1. Application threads hit reference load barrier 710. The reference load barrier 710 includes instructions for checking whether the color 706 of the reference 705 includes remapped bits that match the current relocation phases of both the young generation and the old generation. In particular, the instruction determines whether the correct bit among the remapped bits is set.

To achieve this, a logical bit-wise right-shift operation is applied to reference 705. The system may move the reference n times to the right, where n equals the shift value set by the GC. Each bit is shifted right by n positions and the n bits with default values are inserted into the leftmost (e.g., highest) bit. For example, if the canonical form requires the most significant bit to be 0, then the shift operation may insert n 0 s into the leftmost bit. Because color 706 is stored in the lowest (rightmost) bit of reference 705, a right shift operation applied to the reference has the effect of removing color bit 706. Further, because the remapped bits are stored in the most significant portion of the color, the remapped bits are the last bit or bits removed by the right shift operation. In particular, the shift value set by GC corresponds to the position of exactly one bit set in the current "good color" among the remapped bits.

The system may then determine whether the last bit shifted out of the reference is set (e.g., the correct bit indicating the remap bit is set). For example, in the x86-64 architecture, the system may determine whether the carry flag and the zero flag are set. After the shift-by-bit operation in the x86-64 architecture, the carry flag is equal to the last bit shifted out of the reference, and if after the shift operation is complete, all bits in the reference are 0, then the zero flag is set. Thus, when the correct bit of the remapped bits is set, the carry flag is set; the zero flag is set when the reference is a reference to a null value (e.g., address 0). If the carry flag is not set and the zero flag is not set, then the application thread takes the slow path 714. In other cases (e.g., carry flag set, or zero flag set), the application thread takes the fast path 712. In other system architectures, other techniques may be used to determine whether the last bit shifted out of the reference is set.

The fast path 712 does not necessarily involve any GC operations, such as remapping references and/or marking objects as surviving. Color 706 has been removed from reference 705 by a right shift operation. The result "00000048" is saved as reference 707 in the call stack local variable 732, such as at r 3. The application thread may then dereference reference 707. The application thread accesses the address indicated by reference 707, namely address "00000048" within heap 730. The application thread obtains the value "TOM" at address "00000048" within heap 730.

When the system determines that the application thread should take the slow path, the application thread may select one of the slow path pools. In particular, the application thread may reload the reference and select a slow path from the slow path pool based on color 706. For example, the application thread may remap the address indicated by reference 705. For example, the application may mark the object to which reference 705 points as surviving. The application thread may then update the color 706 of the reference 705 to a good color. In addition, the application thread may remove color 706 from reference 705 for storage in call stack local variables 732, as described above. In particular, the application thread may apply a logical shift-by-bit right operation to the reference 705. The system may shift the reference n times to the right, where n equals the shift value set by the GC.

FIG. 8 illustrates a reference write barrier according to an embodiment. As shown, heap 830 includes addresses 00000008, 00000016, … … 00000024, 00000032, … … 00000048. The call stack local variable 832 includes registers r1, r2, r3. In an example, the reference includes 32 bits. The color of the heap reference may be indicated by bits 0-15.

The sample code may include the following:

based on the code line Person temp2 = new Person (), the application thread creates a new object in heap 830 and reference temp2 references the new object. The object (referenced by temp 2) belongs to the Person type and contains a name field of the String type. The object (referenced by temp 2) is stored at address "00000024" within heap 830. The name field of the object (referenced by temp 2) is stored at address "00000032" within heap 830. The name field is filled with a reference 805.

Based on code line temp2. Name=temp 3, the application thread attempts to write a reference 807 from the call stack local variable 832 into the heap 830. In particular, the application thread attempts to write reference 807 to address "00000032," which is the location of the name field storing the object referenced by temp2.

Application threads hit reference write barrier 810. Reference write barrier 810 includes instructions for adding color 806 to reference 807. In particular, the application thread determines which color is currently a good color based on the current GC stage. The application thread then colors reference 807 using the good color. Coloring reference 807 with good color may include: (a) Applying a bitwise left shift operation to the reference to shift the reference to the left n times, where n equals the shift value set by the GC and inserts n 0 s in the lowest bits of the reference, and (b) applying a logical bitwise OR to the left shift result and a good color bitmask that includes the good color set by the GC in the lowest bits (e.g., bits 0-15) and 0 s in each other bit. The result of OR is "00488A40". The application thread writes the result "00488a40" to address "00000032" in heap 830.

5. Loading heap references by application threads

FIG. 9 illustrates a set of operations for loading heap references by an application thread, according to an embodiment. One or more of the operations illustrated in fig. 9 may be modified, rearranged, or omitted altogether. Accordingly, the particular order of operations illustrated in FIG. 9 should not be construed as limiting the scope of one or more embodiments. The operations as shown in fig. 9 do not limit the manner in which the operations are expressed in a set of codes. The operations of fig. 9 may correspond to a single instruction in a set of codes; instead, the single operation of FIG. 9 may correspond to multiple instructions in a set of codes. The operations of FIG. 9 are described as being performed by a single application thread; however, these operations may be performed by multiple application threads and/or GC threads.

One or more embodiments include receiving, by a mutation thread (e.g., an application thread external to the GC), a request to load a reference from heap memory onto a call stack of the application thread (operation 902). During the concurrency phase of the GC system, an application thread executes a set of codes (e.g., bytecodes). The concurrency stage of the GC system may update at least the current good color and the current good shift value based on this stage. In an embodiment, the GC system may enter the barrier patch using, for example, a compiled method to update the current good color and the current good shift value. The set of code executed by the application thread includes a request to load a reference from heap memory onto a call stack of the application thread.

One or more embodiments include identifying and retrieving, by the application thread, the reference from heap memory (operation 904). The application thread identifies references (referred to herein as "heap references") from heap memory.

Instead of storing the reference directly onto the call stack, the application thread hits the load barrier first. The application thread checks whether the current GC state of the heap reference is "good". The application thread analyzes the heap references to determine if the current GC state of the heap references is good.

One or more embodiments include determining whether the correct remap bit among the remap bits in the heap-referenced color is set (operation 906). As one example, in the x86-64 system architecture, to determine whether the correct remap bit is set, the mutation thread may execute a load barrier that results in a shift-by-bit right operation being applied to the heap reference. The shift-right operation by bit results in the heap referenced bit being shifted right n times, where n equals the good shift value. After performing the shift operation, the carry flag is set to the last bit shifted out of the reference (e.g., carry flag is 1 if the last bit shifted out is 1; carry flag is 0 if the last bit shifted out is 0). The shift operation also results in setting a zero bit if the remaining values in the reference all contain 0 s.

If the correct GC state remapping bit is set ("Yes" in operation 906), one or more embodiments include storing a reference to the call stack without any indication of any GC state (operation 910). The application thread takes a "fast path" in which it involves skipping GC operations, such as remapping references, marking objects as surviving and/or updating GC states of references (e.g., GC operations shown in fig. 15). The application thread directly performs operation 910, which will be discussed further below.

If the correct GC state remapping bit is not set ("NO" of operation 906), one or more embodiments include reloading the heap reference and completing a set of GC operations to bring the heap reference from the current GC state to a good GC state (operation 908). The application thread selects one slow path operation from a set of candidate slow path operations based on the good GC state and the current GC state of the heap reference. As a non-limiting example, a slow path operation to be performed may include marking a corresponding object as alive and updating the GC state indicated by the heap reference to a good GC state; remapping the reference to a new address, marking the corresponding object as alive, and updating the GC state indicated by the heap reference to a good GC state; or remap the reference to the new address and update the GC state indicated by the heap reference to a good GC state.

One or more embodiments include storing, by the mutation (application) thread, a reference to the call stack without any indication of any GC state (operation 910). The logic performed in operation 906 removes any indication of any GC state from the heap reference in bit-shifting operations. The application thread stores the result of the shift-by-bit operation in the call stack. Heap references with indications of GC status continue to be stored in heap memory.

In some embodiments, the reference stored onto the call stack does not have an indication of GC state. The reference on the call stack does not include any information about the progress of the GC operation relative to the reference. In particular, the heap reference includes (a) a first set of bits that indicate an address of a corresponding object, and (b) a second set of bits that indicate one or more GC states. The reference stored on the call stack includes (a) a same first set of bits indicating a same address of a same corresponding object, and (b) a third set of bits, different from the second set of bits, that do not indicate any GC state.

In other embodiments, the reference stored onto the call stack does not have an indication as to which state of the mutually exclusive set of GC states is associated with the reference. However, the reference on the call stack may indicate other GC states (e.g., the age of the reference). In particular, the heap reference includes (a) a first set of bits that indicate an address of a corresponding object, (b) a second set of bits that indicate one of a set of mutually exclusive GC states associated with the heap reference, and (c) a third set of bits that indicate one or more other GC states. The reference stored on the call stack includes (a) the same first set of bits indicating the same address of the same corresponding object, and (b) the same third set of bits indicating the same other GC state.

In an embodiment, the application thread attempts to dereference the reference stored on the call stack based on operation 910. As described above, in some embodiments, the hardware system (such as a processor) on which the application thread executes requires that the non-addressable portion of the reference conform to a canonical form before being de-referenced. Thus, before dereferencing (if dereferencing is performed), the processor verifies that the reference complies with the canonical form. Even if the indication of the GC state contained in the heap reference violates the canonical form, the indication has been removed from the reference on the call stack. Thus, the processor determines that the reference on the call stack complies with the canonical form. The application thread thus successfully dereferences the reference on the call stack.

6. Writing heap references by application threads

FIG. 10 illustrates a set of operations for writing heap references by an application thread, in accordance with an embodiment. One or more of the operations illustrated in fig. 10 may be modified, rearranged or omitted altogether. Accordingly, the particular order of operations illustrated in FIG. 10 should not be construed as limiting the scope of one or more embodiments. The operations as shown in fig. 10 do not limit the manner in which the operations are expressed in a set of codes. The operations of fig. 10 may correspond to a single instruction in a set of codes; instead, the single operation of FIG. 10 may correspond to multiple instructions in a set of codes. The operations of FIG. 10 are described as being performed by a single application thread; however, these operations may be performed by multiple application threads and/or GC threads.

One or more embodiments include receiving, by a mutation (application) thread, a request to write a reference onto heap memory (operation 1002). An application thread executes a set of codes (e.g., bytecodes). The set of codes includes a request to write a reference to heap memory. For example, the request may be to write a reference stored on a call stack of an application thread to heap memory.

In an embodiment, the reference does not have any indication as to which GC state is the current GC state of the reference. The reference does not include any information or metadata indicating the progress of the GC operation with respect to the reference. In another embodiment, the reference does not have any indication as to which of a set of mutually exclusive GC states is the current GC state of the reference; however, the reference may include information about other GC states (e.g., the age of the reference). In an embodiment, the reference to be written may have been previously dereferenced (by the application thread currently attempting to write the reference to heap memory and/or another thread).

One or more embodiments include determining a "good" GC state associated with the GC processing (operation 1004). The application thread determines a "good" GC state associated with the GC processing. In an embodiment, a "good" GC state is indicated by a constant that is updated by GC processing entering barrier patches via compiled methods. In an embodiment, the application thread may create a good bitmask that includes the determined "good" GC state in the lowest bits and 0 in all other bits.

One or more embodiments include storing, by the application thread, a reference to the heap memory with an added indication of good GC state as the current GC state of the reference (operation 1006). The application thread adds an indication of the good GC state as the current GC state of the reference. For example, an application thread may apply a logical bit-wise left shift operation to a reference. The shift-by-bit left operation results in the referenced bit being shifted left n times, where n equals the good shift value. The application thread may perform a logical OR of the shifted reference and the good bitmask. The application stores a reference to heap memory, the reference including an indication of the current GC state of the reference.

In an embodiment, an indication of GC state is used in subsequent accesses to heap references. As an example, the GC thread executing the tagging phase may identify the heap reference. The GC thread may select a path to take relative to the heap reference based on the GC state indicated by the reference. As another example, an application thread may load heap references. The application thread may hit the load barrier. Within the load barrier, an application thread may select a path to take relative to a heap reference based on the GC state indicated by the reference (e.g., using the operations of fig. 9). After executing the selected path, the application thread may load the heap reference.

7. Other aspects; expansion of

Embodiments are directed to a system having one or more devices including a hardware processor and configured to perform any of the operations described herein and/or in any of the following claims.

In an embodiment, a non-transitory computer-readable storage medium includes instructions that, when executed by one or more hardware processors, cause performance of any of the operations described herein and/or in the claims.

Any combination of the features and functions described herein may be employed in accordance with one or more embodiments. In the foregoing specification, various embodiments have been described with reference to numerous specific details that may vary from one implementation to another. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what the applicant expects to be the scope of the invention is the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.

8. Hardware overview

According to one embodiment, the techniques described herein are implemented by one or more special purpose computing devices. The special purpose computing device may be hardwired to perform the techniques, or may include digital electronics such as one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) that are permanently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques in accordance with program instructions in firmware, memory, other storage, or a combination. These special purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to implement the techniques. The special purpose computing device may be a desktop computer system, portable computer system, handheld device, networking device, or any other device that implements techniques in conjunction with hardwired and/or program logic.

For example, FIG. 11 is a block diagram that illustrates a computer system 1100 upon which an embodiment of the invention may be implemented. Computer system 1100 includes a bus 1102 or other communication mechanism for communicating information, and a hardware processor 1104 coupled with bus 1102 for processing information. The hardware processor 1104 may be, for example, a general purpose microprocessor.

Computer system 1100 also includes a main memory 1106, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to bus 1102 for storing information and instructions to be executed by processor 1104. Main memory 1106 also may be used for storing a temporary variable or other intermediate information during execution of instructions to be executed by processor 1104. When stored in a non-transitory storage medium accessible to the processor 1104, the instructions cause the computer system 1100 to be a special purpose machine that is customized to perform the operations specified in the instructions.

Computer system 1100 also includes a Read Only Memory (ROM) 1108 or other static storage device coupled to bus 1102 for storing static information and instructions for processor 1104. A storage device 1110, such as a magnetic disk or optical disk, is provided and storage device 1110 is coupled to bus 1102 for storing information and instructions.

Computer system 1100 may be coupled via bus 1102 to a display 1112, such as a Cathode Ray Tube (CRT), for displaying information to a computer user. An input device 1114 (including alphanumeric and other keys) is coupled to bus 1102 for communicating information and command selections to processor 1104. Another type of user input device is cursor control 1116, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 1104 and for controlling cursor movement on display 1112. Such input devices typically have two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), which allows the device to specify positions in a plane.

Computer system 1100 may implement the techniques described herein using custom hardwired logic, one or more ASICs or FPGAs, firmware, and/or program logic, in combination with the computer system, to make computer system 1100 a or program computer system 1100 into a special purpose machine. According to one embodiment, the techniques herein are performed by computer system 1100 in response to processor 1104 executing one or more sequences of one or more instructions contained in main memory 1106. Such instructions may be read into main memory 1106 from another storage medium, such as storage device 1110. Execution of the sequences of instructions contained in main memory 1106 causes processor 1104 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

The term "storage medium" as used herein refers to any non-transitory medium that stores data and/or instructions that cause a machine to operate in a specific manner. Such storage media may include non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 1110. Volatile media includes dynamic memory, such as main memory 1106. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

Storage media are different from, but may be used in conjunction with, transmission media. Transmission media participate in transmitting information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 1102. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 1104 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 1100 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data carried in the infrared signal and appropriate circuitry can place the data on bus 1102. Bus 1102 carries the data to main memory 1106, from which main memory 1106 processor 1104 retrieves and executes the instructions. The instructions received by main memory 1106 may optionally be stored on storage device 1110 either before or after execution by processor 1104.

Computer system 1100 also includes a communication interface 1118 coupled to bus 1102. Communication interface 1118 provides a two-way data communication coupling to a network link 1120, wherein network link 1120 is connected to a local network 1122. For example, communication interface 1118 may be an Integrated Services Digital Network (ISDN) card, a cable modem, a satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 1118 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 1118 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 1120 typically provides data communication through one or more networks to other data devices. For example, network link 1120 may provide a connection through local network 1122 to a host computer 1124 or to data equipment operated by an Internet Service Provider (ISP) 1126. ISP 1126 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 1128. Local network 1122 and internet 1128 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 1120 and through communication interface 1118 are example forms of transmission media, which carry the digital data to computer system 1100 or carry the digital data from computer system 1100.

Computer system 1100 can send messages and receive data, including program code, through the network(s), network link 1120 and communication interface 1118. In the Internet example, a server 1130 might transmit a requested code for an application program through Internet 1128, ISP 1126, local network 1122 and communication interface 1118.

The received code may be executed by processor 1104 as it is received, and/or stored in storage device 1110, or other non-volatile storage for later execution.

In the foregoing specification, embodiments have been described with reference to numerous specific details that vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what the applicant expects to be the scope of the invention, is the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.

Claims

1. One or more non-transitory machine-readable media storing instructions that, when executed by one or more processors, cause performance of operations comprising:

Receiving a request from the mutating thread to load a first reference to a first object from heap memory onto a call stack of the application thread;

in response to receiving the request, retrieving a first reference from the heap memory, the first reference comprising (a) a first memory address at which a first object is stored in the heap memory during at least a first period of time, and (b) an indication of a first garbage collection state of a plurality of garbage collection states associated with the first reference;

wherein respective ones of the plurality of garbage collection states indicate at least (a) a stage of the garbage collector when the first reference is written to the heap memory, and (b) a garbage collection generation associated with the first reference;

performing a bit-per-displacement operation that (a) removes one or more bits representing a first garbage collection state, and (b) generates a second reference from the first reference;

determining, based on a particular bit of the one or more bits, whether to perform a set of garbage collection operations on the first reference to bring the first reference to a good state; and

a second reference is stored to the call stack that does not have any indication of any of the plurality of garbage collection states.

2. The medium of claim 1, wherein the particular bit is a last bit shifted out of the first reference in a bit-by-bit operation.

3. The medium of claim 1, wherein the operations further comprise:

determining that the particular bit indicates that a relocation phase associated with the first reference matches a current relocation phase of the garbage collector;

in response to determining that the particular bit indicates that a relocation phase associated with the first reference matches a current relocation phase of the garbage collector, performing the set of garbage collection operations on the first reference is avoided.

4. The medium of claim 1, wherein the operations further comprise:

determining that the particular bit indicates that the relocation phase associated with the first reference does not match a current relocation phase of the garbage collector;

the set of garbage collection operations is performed on the first reference in response to determining that the particular bit indicates that the relocation phase associated with the first reference does not match a current relocation phase of the garbage collector.

5. The medium of claim 1, wherein the bit-per-displacement operation is included in a command to determine a value of a particular bit of the shifted bits.

6. The medium of claim 1, wherein the bit-per-shift operation is a right-shift operation, wherein the one or more bits represent n least significant bits in the first reference.

7. The medium of claim 1, the operations further comprising:

comparing at least one of the removed one or more bits representing the first garbage collection state to a garbage collection state indicated by a garbage collection thread;

determining whether the first memory address is empty;

in response to determining that (a) the at least one of the one or more bits that is removed that represents the first garbage collection state matches a garbage collection state indicated by a garbage collection thread, and (b) the first memory address is not empty, the set of garbage collection operations is performed on the first reference.

8. The medium of claim 1, wherein performing the bit-per-displacement operation further comprises adding at least one bit having a default value to the first reference to generate a second reference.

9. A method comprising the operations of any one of claims 1-8.

10. A system, comprising:

at least one device comprising a hardware processor;

the system is configured to perform the operations of any of claims 1-8.

11. A system comprising means for performing the operations of any one of claims 1-8.