Toni-LSM
is a educational project to implement a simple kv database from scratch, using lsm-tree as the storage engine. The project manager uses xmake. The project is inspired by mini-lsm, tinykv and leveldb. The project is partly compatible with the Redis Resp protocol, so it can be used as a redis backend and relpace redis-server
(Just for fun ๐ฎ).
The developing posts can be found in my blog or zhihu. The is also a recorded video Course.
We now offer a complete hands-on step-by-step Lab course designed for learners interested in building LSM-based storage engines from scratch.
Project production is not easy, please click star โญ to support us โค๏ธ
๐ Toni-LSM Lab Course
You can also join the QQ group ๐ฌ for discussion:
The project uses xmake as the build system. Below is the xmake configuration for building the project and running tests:
- Compile the project
xmake
- Run the example program or test
xmake run example
xmake run test_lsm
- Install the shared library
xmake install --root lsm_shared
Here is a simple example demonstrating how to use the LSM Tree for basic key-value operations:
#include "../include/lsm/engine.h"
#include "../include/lsm/level_iterator.h"
#include <iostream>
#include <string>
using namespace ::toni_lsm;
int main() {
// create lsm instance, data_dir is the directory to store data
LSM lsm("example_data");
// put data
lsm.put("key1", "value1");
lsm.put("key2", "value2");
lsm.put("key3", "value3");
// Query data
auto value1 = lsm.get("key1");
if (value1.has_value()) {
std::cout << "key1: " << value1.value() << std::endl;
} else {
std::cout << "key1 not found" << std::endl;
}
// Update data
lsm.put("key1", "new_value1");
auto new_value1 = lsm.get("key1");
if (new_value1.has_value()) {
std::cout << "key1: " << new_value1.value() << std::endl;
} else {
std::cout << "key1 not found" << std::endl;
}
// delete data
lsm.remove("key2");
auto value2 = lsm.get("key2");
if (value2.has_value()) {
std::cout << "key2: " << value2.value() << std::endl;
} else {
std::cout << "key2 not found" << std::endl;
}
// iterator
std::cout << "All key-value pairs:" << std::endl;
// begin(id): id means transaction id, 0 means disable mvcc
for (auto it = lsm.begin(0); it != lsm.end(); ++it) {
std::cout << it->first << ": " << it->second << std::endl;
}
// transaction
auto tranc_hanlder = lsm.begin_tran(IsolationLevel::REPEATABLE_READ);
tranc_hanlder->put("xxx", "yyy");
tranc_hanlder->put("yyy", "xxx");
tranc_hanlder->commit();
auto res = lsm.get("xxx");
std::cout << "xxx: " << res.value() << std::endl;
lsm.clear();
return 0;
}
TODO
for the supported Redis commands.
xmake run server
Then you can use redis-cli to connect to the server:
The project is under development, and the current version is not stable.
We use the official tool redis-benchmark
to test the performance of the wrapper redis server. The QPS of commonly used commands are relatively high, considering its IO between memroy and disk.
๐ The testing environment is: Win11 WSL Ubuntu 22.04, 32GB 6000 RAM, Intel 12600K.
(base) โ ~ redis-benchmark -h 127.0.0.1 -p 6379 -c 100 -n 100000 -q -t SET,GET,INCR,SADD,HSET,ZADD
WARNING: Could not fetch server CONFIG
SET: 142653.36 requests per second, p50=0.527 msec
GET: 134589.50 requests per second, p50=0.503 msec
INCR: 132802.12 requests per second, p50=0.503 msec
SADD: 131233.59 requests per second, p50=0.519 msec
HSET: 123456.79 requests per second, p50=0.583 msec
ZADD: 126422.25 requests per second, p50=0.615 msec
๐ Besides, the QPS of
lpush/rpush
is so slow that its design needs to be optimized.
The performance of the wrapper redis server is not very good, but it is still fast enough for most use cases. Howerver, the redis server is built based on the LSM Tree KV engine, so it consists of some redundancy locks. If you use the LSM Tree KV engine directly, you can get much better performance. The reason why we use redis-benchmark
to test the compatible redis api but not the actual KV engine API is that writing a sophisticated testing tool costs much time so we choose to use the existed redis-benchmark
.
- SkipList
- get/put/remove
- iterator
- Range Query
- MemTable
- Iterator
- Range Query
- flush to sst
- SST
- Encode/Decode
- Iterator
- Query
- Range Query
- Compact
- Wal
- Sync Wal
- Async Wal
- Recover
- Transaction
- MVCC
- Isolation Level
- Read Uncommitted
- Read Committed
- Repeatable Read (Default)
- Serializable
- Config
- Toml Config
- Redis
- Fundamental KV Operations
- set/get
- ttl/expire
- Hash Operations
- hset/hget/hdel
- hkeys
- List Operations
- lpush/rpush/lpop/rpop
- llen
- lrange
- ZSet
- ZADD/ZREM/ZINCRBY
- ZCARD
- ZRANGE
- ZSCORE
- ZRANK
- Set
- SADD/SREM
- SMEMBERS/SISMEMBER
- SCARD
- IO Operations
- FLUSHALL
- SAVE
- SDK
- Python
- Fundamental KV Operations
๐ Only commonly used redis commands are supported. The other implementations can refer to
src/redis_wrapper/redis_wrapper.cpp
. If you need more commands, please submit a pull request.
Currently, the project only supports a subset of the Redis Resp protocol. The supported commands are listed above. You can add more commands by following the existing implementations in src/redis_wrapper/redis_wrapper.cpp
.
If you want to develop a new SDK for the project, please refer to the existing SDKs in src/sdk/
. Curently, we only have a Python SDK. You can use build the python sdk as following:
cd sdk
./build_python_sdk.sh
Then you can use the SDK in your Python code:
import tonilsm
db = tonilsm.LSM("test2_db")
db.put(b"tomxx", b"catxx")
db.get("tomxx")
t = db.begin_tran()
t.get('tomxx')
t.put('tomxx','1')
t.get('tomxx')
t.commit()
db.get("tomxx")
We welcome contributions for developing SDKs in other programming languages.
If you want to contribute to the project, please check the issues. Also, you can concat me by email๐ง
I have a plan to make this project a step-to-step Lab like CMU15445
or MIT 6.824
. If you are interested in this project, please feel free to contact me by email๐ง
Besides, the branch lab-dev
is a development branch for the lab construction.
This project is licensed under the MIT License.