GitHub - sunzc/Compiler

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
typecheck		typecheck
.gitignore		.gitignore
E--_RT.c		E--_RT.c
ICode.l		ICode.l
ICode.tab.h		ICode.tab.h
ICode.y		ICode.y
ICode_lexer.C		ICode_lexer.C
ICode_parser.C		ICode_parser.C
Makefile		Makefile
README		README
desc.pdf		desc.pdf
desc.tex		desc.tex
driver.C		driver.C
easm		easm
erun		erun
test1.c		test1.c
test1.i		test1.i
test_demo.in		test_demo.in

Repository files navigation

Here are the contents of this directory:

ICode.l, ICode.y: Flex/Bison files for intermediate code

driver.C: Driver program for intermediate code assembler.

Makefile: Used to compile the assembler

easm: It takes an intermediate code file (e.g., test1.i) as input,
      translates it into C-code (e.g., test1.c), and then compiles
      it using gcc to generate an executable (e.g., test1). 
      The assembler program is recompiled if necessary before these steps.

erun: Assembles the intermediate code and runs it. It requires at least
      one command-line argument, which is the name of the intermediate
      code file (e.g., test1.i). Other command line arguments control
      some printing options.

There is one sample intermediate code file called test1.i included in this
directory. The output produced using the following command is also
included:

./erun test1.i -dr -df -m 10000000 -dm 9990 10000 > test1.out 2> test1.out < test1.in

About the assembler
-------------------

The assembler for the intermediate code ("icode") is written using Flex
and Bison. Since its input is intended to be generated by a compiler, the
assembler does not provide user-friendly syntax error messages --- all you
will get is the line number where there is an error. 

The assembler ensures syntactic and type correctness of icode, and then
translates it into C-code, which is in turn compiled using gcc. Currently,
this compilation is done without optimization: this way, any optimizations
you make will translate to faster performance. (If the -O option to gcc
were used, optimizations performed by gcc will likely cause even lousy
code to be translated into efficient machine code.)

Machine model
--------------

The intermediate code is intended to model a very simple instruction set.
Complete information about the instruction set can be ontained from the
source code files. In particular, note that each icode instruction is
translated by the assembler into a macro with the same name. The definitions
of these macros appear in E--_RT.c, and should provide you a clear understanding
of the instruction set semantics.

Icode supports 1000 integer registers (R000 to R999) and 1000 floating point
registers (F000 to F999). The size of integers and floating point numbers is the
same as the size of int and float respectively on the underlying architecture
on which the assembled C-code is compiled. 

Icode memory starts at address zero, and goes on to a maximum address
specified as a command-line option to the C-program generated by the
assembler. (The default memory size ia 1M words.) Memory is
word-addressed, not byte-addressed. This memory is represented as an array
in the C-program. Guard zones are set up on either side of the array to
catch out-of-bounds accesses (which will trigger a memory exception). To
simplify the organization of memory, we make the assumption that
sizeof(int) == sizeof(float). This enables integers and floating point
numbers to be both stored in one memory word.

Execution of icode starts at the first instruction in the icode file,
and execution is stopped when control flows past the last instruction
in the file.

The instructions can be divided into the following categories. In all cases,
destination operands are listed following the source operands. Any
instruction can have a label.

-- integer arithmetic and bit operations: includes ADD, SUB, DIV, MUL,
      MOD, NEG, AND, OR and XOR. The last 3 operations are bit-wise
      operations, appling the specified boolean on corresponding bits of
      two integer operands. These operations have two source operands (just
      one source in the case of NEG) that can either be a value or
      a register, and a destination operand that must be a register.

-- floating point arithmetic:
      includes FADD, FSUB, FDIV, FMUL and FNEG. They have two source operands
      (one in the case of FNEG) that can be values or registers, and a 
      destination operand that must be a register.

-- integer relational operations: includes GT and GE that treat their
      two operands as signed integers; UGT and UGE that operate on unsigned 
      integers; EQ and NE that operate on integers regardless of size; 
      All operands can be values or registers. 

      NOTE: Relational operators are not stand-alone instructions, but 
      instead, appear as part of a conditional jump instruction.

-- floating point relational operations: includes FGT, FGE, FEQ and FNE 
      that operate on floating point operands (values or registers).

-- print instructions: include PRTI, PRTS, and PRTF that each take a 
      a single operand. In the case of PRTI, this operand represents the 
      integer value to be printed, or the register that needs to be printed. 
      PRTF is similar, except that it prints a floating point operand.
      In the case of PRTS, the operand is a string constant (enclosed within
      double quotes) or a register containing a string constant, i.e., the
      register was previously initialized by a MOVS <string_constant> <reg>. 

-- jump instructions: 
      unconditional jump: JMP <loc>, where <loc> is a label. 
      conditional jump: JMPC <cond> <loc>, where <cond> is a relational
         operator with parameters. Example: JMPC GE R000 R001 <loc>
      indirect jump: JMPI <reg>, where <reg> is an integer register that 
         has previously been initialized with a label value using a MOVL
         instruction.
      conditional indirect jump: Example JMPCI FGE F001 1.0 R010.

-- data movement instructions: 
      move label to a register: MOVL <label> <intreg>
      move string to register:  MOVS <stringConstant> <intreg>

      move integer to register: MOVI <valueOrReg> <intreg>
      move float to register:   MOVF <valueOrReg> <freg>

      move int to float reg:    MOVIF <intreg> <freg>
      move float to int reg:    MOVIF <freg> <intreg>

      load int reg from mem:    LDI <reg> <valuOrReg>
      load float reg from mem:  LDF <reg> <valuOrReg>
      store int reg to mem:     STI <reg> <valuOrReg>
      store float reg to mem:   STF <reg> <valuOrReg>

-- input instruction (for reading input data)
      IN <reg> reads a single byte from input stream and stores it into
         the specified register. A negative return value indicates an
         error, with the semantics the same as that of getc.
      INI <reg>, INF <reg> read an integer (or floating point number) into the 
         specified register. Aborts execution if any errors are encountered.