A high-level programming language designed for the manipulation of audio files
Vishruth Devan | vd2461 | vd2461@columbia.edu |
---|
-
Install Docker from https://docs.docker.com/get-started/get-docker/
-
Clone the repository:
git clone https://github.com/vishruthdevan/wavy.git
-
Run the
test-lexer.sh
script:./test-lexer.sh /wavy/<path-to-sample-file.vy>
-
If you get a permission denied error while running the script, run the following command and try again:
chmod 755 test-lexer.sh
-
Example Usage:
./test-lexer.sh /wavy/lexer/samples/sample_1.vy
-
The expected outputs are the
.out
files in thelexer/samples/expected_outputs/
directory. Running the script will generate.out
files in the same directory as the input file. For example, if/wavy/lexer/samples/sample_1.vy
was the input, the output will be written to/wavy/lexer/samples/sample_1.vy.out
.
-
Run the
test-parser.sh
script:./test-parser.sh /wavy/<path-to-sample-file.vy>
-
If you get a permission denied error while running the script, run the following command and try again:
chmod 755 test-parser.sh
-
Example Usage:
./test-parser.sh /wavy/parser/samples/sample_1.vy
-
The expected outputs are the
.out
files in theparser/samples/expected_outputs/
directory. Running the script will generate.out
files in the same directory as the input file. For example, if/wavy/parser/samples/sample_1.vy
was the input, the output will be written to/wavy/parser/samples/sample_1.vy.out
. -
sample_1.vy and sample_4.vy have been intentionally modified to make the parser identify errors.
-
Run the
test-compiler.sh
script:./test-compiler.sh /wavy/<path-to-sample-file.vy>
-
If you get a permission denied error while running the script, run the following command and try again:
chmod 755 test-compiler.sh
-
Example Usage:
./test-compiler.sh /wavy/compiler/samples/sample_1.vy
-
The expected outputs are the
.out
files in thecompiler/samples/expected_outputs/
directory. Running the script will generate.out
files in the same directory as the input file. For example, if/wavy/compiler/samples/sample_1.vy
was the input, the output will be written to/wavy/compiler/samples/sample_1.vy.out
.
-
Run the
test-vm.sh
script:./test-vm.sh /wavy/<path-to-sample-file.vy>
-
If you get a permission denied error while running the script, run the following command and try again:
chmod 755 test-vm.sh
-
Example Usage:
./test-vm.sh /wavy/vm/samples/sample_1.vy
-
The expected outputs are the
.out
files in thevm/samples/expected_outputs/
directory. Running the script will generate.out
files in the same directory as the input file. For example, if/wavy/vm/samples/sample_1.vy
was the input, the output will be written to/wavy/vm/samples/sample_1.vy.out
. -
sample_1.vy.incorrect has been intentionally modified to make the parser identify errors.
These are reserved words with specific meanings that cannot be used as identifiers.
Keywords: function, return, if, else, true, false, null, for, in, load, export
Rules:
- Exact match with the string in the
keywords
map. - Case-sensitive.
An identifier represents variables, functions, or other user-defined or built-in names.
Valid Characters:
- Letters (
a-z
,A-Z
) - Digits (
0-9
) - Underscore (
_
) - Dollar sign (
$
)
Rules:
- Cannot start with a digit.
- Example:
foo
,bar_123
,$myVar
.
Types:
- INTEGER: Whole numbers without a decimal point (e.g.,
42
). - FLOAT: Numbers with a decimal point (e.g.,
3.14
).
Rules:
- INTEGER: Sequence of digits.
- FLOAT: Contains a decimal point and digits on both sides of the decimal point.
A string is a sequence of characters enclosed in double quotes ("
) or single quotes ('
).
Rules:
- The string must be enclosed in a matching pair of quotes.
- Can contain any unicode character.
- Example:
"Hello, World!"
Operator | Token | Description |
---|---|---|
= |
ASSIGN |
Assignment |
+ |
PLUS |
Addition |
- |
MINUS |
Subtraction/Negation |
* |
ASTERISK |
Multiplication |
/ |
SLASH |
Division |
! |
BANG |
Logical NOT |
< |
LT |
Less than |
> |
GT |
Greater than |
== |
EQUALS |
Equality comparison |
!= |
NOT_EQUALS |
Inequality comparison |
Symbol | Token | Description |
---|---|---|
, |
COMMA |
Separator |
; |
SEMICOLON |
Statement terminator |
: |
COLON |
Type separator |
( |
LPR |
Left parenthesis |
) |
RPR |
Right parenthesis |
{ |
LBRACE |
Left brace |
} |
RBRACE |
Right brace |
[ |
LBRACKET |
Left bracket |
] |
RBRACKET |
Right bracket |
- Token:
EOF
- Description: Signals the end of the input stream.
- Token:
ILLEGAL
- Description: Any unrecognized or invalid character.
-
Initialization: The lexer starts with the given input and sets up the necessary positions (line, column, etc.).
-
Reading Characters: It moves one character at a time, advancing through the input.
-
Skipping Whitespace: If it encounters spaces, tabs, or newlines, it skips them until it finds a meaningful character.
-
Identifying Tokens:
- For each character, it checks if it matches a known token.
- For complex tokens (like
==
), it looks ahead (peeks) to decide if it’s a multi-character token.
-
Handling Identifiers and Keywords:
- If it encounters a letter or valid character (like
_
), it reads an entire word. - It then checks if the word is a keyword or just a regular identifier.
- If it encounters a letter or valid character (like
-
Handling Numbers:
- If it finds a digit, it reads a full numeric sequence, supporting both integers and floats.
- If it detects an invalid number, it raises an error.
-
Reading Strings:
- For strings enclosed in
"
or'
, it reads until it finds the closing quote or raises an error for an unterminated string.
- For strings enclosed in
-
Error Handling:
- An error is thrown for situations where an invalid number, unterminated string or illegal character is detected.
- The lexer only reports the error and continues with scanning the rest of the input.
-
End of Input:
- When the lexer reaches the end of the input, it emits an
EOF
(End of File) token.
- When the lexer reaches the end of the input, it emits an
-
Returning Tokens:
- For each token identified, the lexer advances and returns it to the caller for further processing.
Error Type | Description | Example Input | Error Message |
---|---|---|---|
Illegal Character | Encountered an unrecognized or invalid character. | ^foo = 10 |
Lexical error at line 1, position 1: Illegal character "@" |
Unterminated String | A string literal is not properly closed with a matching quote. | "hello |
Lexical error at line 1, position 7: Unterminated string |
Invalid Number | Incorrect number format detected (e.g., multiple dots). | 12.34. , 123abc |
Lexical error at line 1, position 6: Invalid number |
- Program: Represents the full program consisting of statements.
- Statement: Represents a general statement (e.g., assignment, return, expression).
- Expression: Represents expressions including arithmetic operations, boolean values, function calls, etc.
- Block: Represents a block of statements, often used in function bodies, if statements, and for loops.
- Literal: Represents literals like integers, floats, strings, and boolean values.
- IDENTIFIER: Identifier (variable/function name).
- ASSIGN: =
- RETURN: return
- IF, ELSE, FOR: Keywords for control flow.
- PLUS, MINUS, ASTERISK, SLASH: Arithmetic operators (+, -, *, /).
- LPAREN, RPAREN: Parentheses ((, )).
- LBRACE, RBRACE: Braces ({, }).
- SEMICOLON: ;
- INT_LITERAL, FLOAT_LITERAL, STRING_LITERAL, BOOL_LITERAL: Various literal types.
<Program> → <StatementList>
<StatementList> → <Statement> <StatementList>
| ε
<Statement> → <ExpressionStatement>
| <AssignmentStatement>
| <ReturnStatement>
| <IfStatement>
| <ForLoopStatement>
| <FunctionDeclaration>
<ExpressionStatement> → <Expression> SEMICOLON
<AssignmentStatement> → IDENTIFIER ASSIGN <Expression> SEMICOLON
<ReturnStatement> → RETURN <Expression> SEMICOLON
<IfStatement> → IF LPAREN <Expression> RPAREN LBRACE <Block> RBRACE
| IF LPAREN <Expression> RPAREN LBRACE <Block> RBRACE ELSE LBRACE <Block> RBRACE
<ForLoopStatement> → FOR LPAREN <Expression> RPAREN LBRACE <Block> RBRACE
<FunctionDeclaration> → IDENTIFIER LPAREN <ParameterList> RPAREN LBRACE <Block> RBRACE
<ParameterList> → IDENTIFIER <ParameterListTail>
| ε
<ParameterListTail> → COMMA IDENTIFIER <ParameterListTail>
| ε
<Block> → <StatementList>
<Expression> → <Literal>
| IDENTIFIER
| <PrefixExpression>
| <InfixExpression>
| <FunctionCall>
| <GroupedExpression>
| <ArrayLiteral>
| <IndexExpression>
<PrefixExpression> → (BANG | MINUS) <Expression>
<InfixExpression> → <Expression> (PLUS | MINUS | ASTERISK | SLASH) <Expression>
<FunctionCall> → IDENTIFIER LPAREN <ArgumentList> RPAREN
<ArgumentList> → <Expression> <ArgumentListTail>
| ε
<ArgumentListTail> → COMMA <Expression> <ArgumentListTail>
| ε
<GroupedExpression> → LPAREN <Expression> RPAREN
<ArrayLiteral> → LBRACKET <ExpressionList> RBRACKET
<ExpressionList> → <Expression> <ExpressionListTail>
| ε
<ExpressionListTail> → COMMA <Expression> <ExpressionListTail>
| ε
<IndexExpression> → <Expression> LBRACKET <Expression> RBRACKET
<Literal> → INT_LITERAL
| FLOAT_LITERAL
| STRING_LITERAL
| BOOL_LITERAL
- The Wavy programming language employs recursive descent parsing combined with Pratt parsing specifically for expression evaluation. This design choice leverages recursive descent parsing to provide a clear and modular approach to syntax analysis, where each grammar rule is represented by a function, enabling easy readability and maintainability of the parser.
- For expressions, Wavy uses Pratt parsing, which allows flexible handling of operator precedence and associativity, making it well-suited for parsing complex expressions efficiently. This hybrid approach ensures that Wavy's syntax and expression parsing are both intuitive and powerful, facilitating robust language processing.
URL: https://youtu.be/WfligR-tuQg
- After the source code is parsed, the output is passed to the compiler, which generates an Intermediate Representation (IR) of the code. The IR consists of instructions formatted as:
<StackPosition> <OpCode> <Argument>
- Example:
0025 OpConstant 10
or0046 OpArray 13
.
- Each instruction operates on a single argument, making the IR simple and optimizing-friendly.
- A virtual machine (VM) reads the IR instructions and processes them sequentially. It uses a virtual stack to evaluate each instruction, managing the call stack during execution.
- The VM continues until all instructions are processed, and the stack returns to its initial state.
- In the Wavy programming language, variables are scoped within braces
{}
and can only be accessed within the scope they are defined. - A symbol table is used to manage variables, and they are "popped" from the stack when their scope ends.
- The compiler includes error handling to manage issues like:
- Unknown operators
- Undefined variables
- These errors are detected during compilation and flagged accordingly.
This structure ensures that the code is executed efficiently, supports variable scoping, and allows for optimized compilation processes.