The Declarative Data Generator
Synth is a tool for generating realistic data using a declarative data model. Synth is database agnostic and can scale to millions of rows of data.
Synth answers a simple question. There are so many ways to consume data, why are there no frameworks for generating data?
Synth provides a robust, declarative framework for specifying constraint based data generation, solving the following problems developers face on the regular:
- You're creating an App from scratch and have no way to populate your fresh schema with correct, realistic data.
- You're doing integration testing / QA on production data, but you know it is bad practice, and you really should not be doing that.
- You want to see how your system will scale if your database suddenly has 10x the amount of data.
Synth solves exactly these problems with a flexible declarative data model which you can version control in git, peer review, and automate.
The key features of Synth are:
-
Data as Code: Data generation is described using a declarative configuration language allowing you to specify your entire data model as code.
-
Import from Existing Sources: Synth can import data from existing sources and automatically create data models. Synth currently has Alpha support for Postgres, MySQL and mongoDB!
-
Data Inference: While ingesting data, Synth automatically works out the relations, distributions and types of the dataset.
-
Database Agnostic: Synth supports semi-structured data and is database agnostic - playing nicely with SQL and NoSQL databases.
-
Semantic Data Types: Synth uses the fake-rs crate to enable the generation of semantically rich data with support for types like names, addresses, credit card numbers etc.
- Alpha: We are testing
synth
with a closed set of users - Public Alpha: Anyone can install
synth
. But go easy on us, there are a few kinks - Public Beta: Stable enough for most non-enterprise use-cases
- Public: Production-ready
We are currently in Public Alpha. Watch "releases" of this repo to get notified of major updates.
On Linux and MacOS you can get started with the one-liner:
# Optional, set install path
$ export SYNTH_INSTALL_PATH=~/bin
$ curl -sSL https://getsynth.com/install | sh
For more installation options, check out the docs.
To start generating data without having a source to import from, you need to add Synth schema files to a namespace directory:
To get started we'll create a namespace directory for our data model and call it my_app
:
$ mkdir my_app
Next let's create a users
collection using Synth's configuration language, and put it into my_app/users.json
:
{
"type": "array",
"length": {
"type": "number",
"constant": 1
},
"content": {
"type": "object",
"id": {
"type": "number",
"id": {}
},
"email": {
"type": "string",
"faker": {
"generator": "safe_email"
}
},
"joined_on": {
"type": "date_time",
"format": "%Y-%m-%d",
"subtype": "naive_date",
"begin": "2010-01-01",
"end": "2020-01-01"
}
}
}
Finally, generate data using the synth generate
command:
$ synth generate my_app/ --size 2 | jq
{
"users": [
{
"email": "patricia40@jordan.com",
"id": 1,
"joined_on": "2014-12-14"
},
{
"email": "benjamin00@yahoo.com",
"id": 2,
"joined_on": "2013-04-06"
}
]
}
If you have an existing database, Synth can automatically generate a data model by inspecting the database.
You can use the synth import
command to automatically generate Synth schema files from your Postgres, MySQL or MongoDB database:
$ synth import tpch --from postgres://user:pass@localhost:5432/tpch
Building customer collection...
Building primary keys...
Building foreign keys...
Ingesting data for table customer... 10 rows done.
Finally, generate data into another instance of Postgres:
$ synth generate tpch --to postgres://user:pass@localhost:5433/tpch
We decided to build Synth from the ground up in Rust. We love Rust, and given the scale of data we wanted synth
to generate, it made sense as a first choice. The combination of memory safety, performance, expressiveness and a great community made it a no-brainer and we've never looked back!
If you would like to learn more, or you would like support for your use-case, feel free to open an issue on GitHub.
If your query is more sensitive, you can email opensource@getsynth.com and we'll happily chat about your usecase.
The Synth project is backed by OpenQuery. We are a YCombinator backed startup based in London, England. We are passionate about data privacy, developer productivity, and building great tools for software engineers.
First of all, we sincerely appreciate all contributions t 8000 o Synth, large or small so thank you.
See the contributing section for details.
Synth is source-available and licensed under the Apache 2.0 License.
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!