Cleks.h Documentation

cleks.h is a highly configurable header-only c lexer library capable of tokenizing strings based on user-defined rules for any language, data format, or custom syntax.

For a rewritten and incremental version of this lexer, try cleks2.h

Installation

Download cleks.h or clone the repository for all templates and examples.
Include cleks.h in your project.

#include "cleks.h"

How to use

To tokenize a string, use the Cleks_lex function:

CleksTokens* Cleks_lex(char *buffer, size_t buffer_size, CleksConfig config);

This function takes a buffer and its size as arguments, together with a CleksConfig struct. Based on the provided rules the lexer will allocate a CleksTokens structure, which is a dynamic array of CleksTokens.

typedef struct{
    CleksToken **items;  // the array of CleksTokens
    size_t size;         // the amount of tokens
    size_t capacity;     // the token capacity
    CleksConfig config;  // the CleksConfig struct provided by the user
} CleksTokens;

Each token has a type and value field which contains the string content for default tokens.

typedef int CleksTokenType;
typedef struct{
    CleksTokenType type;
    char *value;
} CleksToken;

Default Tokens

Cleks defines four tokens by default with respective CleksTokenTypes:

TokenType = -4: CLEKS_STRING -> a string was found (content in CleksToken::value)
TokenType = -3: CLEKS_WORD -> an unknown word was found (content in CleksToken::value)
TokenType = -2: CLEKS_INT -> an integer word was found (content as string in CleksToken::value)
TokenType = -1: CLEKS_FLOAT -> a float word was found (content as string in CleksToken::value)

Configuration

Cleks allows full customization of the lexer via the CleksConfig struct.

typedef struct{
    CleksTokenConfig *default_tokens;      
    size_t default_token_count;            
    CleksTokenConfig *custom_tokens;       
    size_t custom_token_count;             
    const char* const  whitespaces;        
    CleksString *strings;                  
    size_t string_count;                   
    CleksComment *comments;                
    size_t comment_count;                  
    uint8_t flags;                 
} CleksConfig;

Field Documentation:

default_tokens - the default tokens of the lexer (always provide CleksDefaultTokenConfig)
default_token_count - the amount of default tokens (use CLEKS_TOKEN_COUNT)
custom_tokens - an array of custom CleksTokenConfigs
custom_token_count - the amount of custom tokens (can be obtained via the CLEKS_ARR_LEN macro)
whitespaces - a string of characters for the lexer to skip
strings - an array of CleksStrings to define string beginning and end delimeters
string_count - the amount of strings
comments - an array of CleksComments to define comment beginning and end delimeters
comment_count - the amount of comments
flags - a single-byte flag mask containing further lexing rules (CLEKS_FLAG_DEFAULT as default)

Custom Tokens

To define custom tokens, provide an array of CleksTokenConfigs.

typedef struct{
    char* print_string; // the string to print for Cleks_print_tokens
    char* word;         // the string defining a word, "" for non-words
    char  symbol;       // the character defining a symbol, '\0' for non-symbols
} CleksTokenConfig;

As can be seen above, there are two types of custom tokens, WORDS and SYMBOLS.

SYMBOL: a single character token (prioritized)
WORD: a multi-character string token

For example:

CleksTokenConfig TestTokenConfig[] = {
    {"If", "if", '\0'},
    {"Left-Bracket", "", '('},
    {"Right-Bracket", "", ')'}
};

Custom tokens start with TokenType=0, which means you can index custom tokens directly by the type of the token, as long as it is positive.

For this you can also use the macro provided by Cleks:

for (size_t i=0; i<tokens->size; ++i){
    CleksToken *token = tokens->items[i];
    if (CLEKS_TOKEN_IS_CUSTOM(token)){
        printf("Token Found: %s\n", <CustomTokenConfig>[token].print_string);
    }
}

Comments

To define custom comments, provide an array of CleksComments.

typedef struct{
    const char* const start_del;  // the string defining a comment's beginning
    const char* const end_del;    // the string defining a comment's end
} CleksComment;

For example:

CleksComment TestComments[] = {
    {"//", "\n"},
    {"#", "\n"},
    {"/*", "*/"}
};

Strings

To define custom strings, provide an array of CleksStrings.

typedef struct{
    const char const start_del; // the character defining a string's beginning
    const char const end_del;   // the character defining a string's end
} CleksString;

For example:

CleksString TestStrings[] = {
    {'"', '"'},
    {'[', ']'}
};

Flags

This mask is used to further customize the behaviour of the lexer.

Currently these flags are available:

CLEKS_FLAG_DEFAULT - the default behaviour
CLEKS_FLAG_NO_INTEGERS - don't recognize integers, instead use CLEKS_WORD
CLEKS_FLAG_NO_FLOATS - dont recognize floats, insted use CLEKS_WORD

You can combine these flags by using Bitwise-OR:

.flags = CLEKS_FLAG_NO_INTEGERS | CLEKS_FLAG_NO_FLOATS

Templates

Cleks provides several templates for well-known formats, such as JSON and Brainfuck. Templates can be found in the templates directory in the repository. When using a template, there is no need of creating a CleksConfig by yourself.

Examples

Here are a few examples of how to use Cleks:

Custom Example

#include "cleks.h"

// to make life a lot easier
enum TestTokens{
    TEST_LT,
    TEST_GT,
    TEST_EQ,
    TEST_IF,
    TEST_THEN,
    TEST_LB,
    TEST_RB,
    TEST_PRINT
};

// define our custom tokens
CleksTokenConfig TestTokenConfig[] = {
    [TEST_LT] = {.print_string = "Less-Than", .word="", .symbol='<'},
    [TEST_GT] = {"Greater-Than", "", '>'},
    [TEST_EQ] = {"Equals", "", '='},
    [TEST_IF] = {"If", "if", '\0'},
    [TEST_THEN] = {"Then", "then", '\0'},
    [TEST_LB] = {"Left-Bracket", "", '('},
    [TEST_RB] = {"Right-Bracket", "", ')'},
    [TEST_PRINT] = {"Function-Print", "print", '\0'}
};

// define our custom comments
CleksComment TestComments[] = {
    {"//", "\n"},  // single-line comment from "//" to the end of the line
    {"#", "\n"},   // single-line comment form "#" to the end of the line
    {"/*", "*/"}   // multi-line comment from "/*" to "*/"
};

// define our custom strings
CleksString TestStrings[] = {
    {'"', '"'},    // string from '"' to '"'
    {'[', ']'}     // string from '[' to ']'
};

int main(void)
{
    CleksConfig TestConfig = {
        .default_tokens = CleksDefaultTokenConfig,
        .default_token_count = CLEKS_TOKEN_COUNT,
        .custom_tokens = TestTokenConfig,
        .custom_token_count = CLEKS_ARR_LEN(TestTokenConfig),
        .whitespaces = " \n",  // we skip ' ' and '\n'
        .strings = TestStrings,
        .string_count = CLEKS_ARR_LEN(TestStrings),
        .comments = TestComments,
        .comment_count = CLEKS_ARR_LEN(TestComments),
        .flags = CLEKS_FLAG_DEFAULT
    };

    char buffer[] = "if(x < 2)/*this is a comment */ // this is also a comment \nthen print(\"x+1\"[is greater than x by 1])"; // our test buffer
    CleksTokens *tokens = Cleks_lex(buffer, strlen(buffer), TestConfig); // lex our buffer
    Cleks_print_tokens(tokens);  // print the found tokens using their `print_string` value
    Cleks_free_tokens(tokens);  // free the allocated memory
    return 0;
}

This will generate the following output:

Token count: 12
  Token   3: If
  Token   5: Left-Bracket
  Token  -3: Word "x"
  Token   0: Less-Than
  Token  -2: Integer "2"
  Token   6: Right-Bracket
  Token   4: Then
  Token   7: Function-Print
  Token   5: Left-Bracket
  Token  -4: String "x+1"
  Token  -4: String "is greater than x by 1"
  Token   6: Right-Bracket

Template Example

In this example we use the template for JSON.

#include "cleks.h"
#include "templates/cleks_json_template.h" // include the template

int main(void)
{
    char buffer[] = "{\"nums\": [1, 2, 3], \"truth\": [true, false, null]}"; // our test buffer
    CleksTokens* tokens = Cleks_lex(buffer, strlen(buffer), JsonConfig); // use the JsonConfig provided by the template
    Cleks_print_tokens(tokens);
    Cleks_free_tokens(tokens);
    return 0;
}

This will generate the following output:

Token count: 21
  Token   0: JsonMapOpen: '{'
  Token  -4: String "nums"
  Token   4: JsonMapSep: ':'
  Token   2: JsonArrayOpen: '['
  Token  -2: Integer "1"
  Token   5: JsonIterSep: ','
  Token  -2: Integer "2"
  Token   5: JsonIterSep: ','
  Token  -2: Integer "3"
  Token   3: JsonArrayClose: ']'
  Token   5: JsonIterSep: ','
  Token  -4: String "truth"
  Token   4: JsonMapSep: ':'
  Token   2: JsonArrayOpen: '['
  Token   6: JsonTrue: true
  Token   5: JsonIterSep: ','
  Token   7: JsonFalse: false
  Token   5: JsonIterSep: ','
  Token   8: JsonNull: null
  Token   3: JsonArrayClose: ']'
  Token   1: JsonMapClose: '}'

API

// Tokenizes the input buffer according to the given config.
CleksTokens* Cleks_lex(char *buffer, size_t buffer_size, CleksConfig config);

// Frees all memory associated with the tokens structure.
void Cleks_free_tokens(CleksTokens *tokens);

// Prints all tokens in a human-readable format.
void Cleks_print_tokens(CleksTokens *tokens);

// Appends a token of a given type to the tokens list.
void Cleks_append_token(CleksTokens *tokens, CleksTokenType token_type, char *token_value);

// Returns the `print_string` of a token
char* Cleks_token_to_string(CleksToken *token, CleksConfig config);

// Checks whether a token is valid (does not check upper bound for custom tokens)
CLEKS_TOKEN_IS_VALID(token);

// Checks whether a token is custom
CLEKS_TOKEN_IS_CUSTOM(token);

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
examples		8000 examples
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleks.h		cleks.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cleks.h Documentation

Table of Contents

Installation

How to use

Default Tokens

Configuration

Field Documentation:

Custom Tokens

Comments

Strings

Flags

Templates

Examples

Custom Example

Template Example

API

About

Uh oh!

Languages

License

fietec/cleks.h

Folders and files

Latest commit

History

Repository files navigation

Cleks.h Documentation

Table of Contents

Installation

How to use

Default Tokens

Configuration

Field Documentation:

Custom Tokens

Comments

Strings

Flags

Templates

Examples

Custom Example

Template Example

API

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages