If you need help, please ask your question with tag kiba-etl on StackOverflow so that other can benefit from your contribution! I monitor this specific tag and will reply to you.
Writing reliable, concise, well-tested & maintainable data-processing code is tricky.
Kiba lets you define and run such high-quality ETL (Extract-Transform-Load) jobs using Ruby.
Learn more on the Wiki, on my blog and on StackOverflow.
Kiba 2.0.0.rc1 (available via gem install kiba --prerelease
) is available for testing.
Kiba 2 introduces a new, opt-in engine called the StreamingRunner
, which allows to generate an arbitrary number of rows inside transforms. This drastically improves the reusability & composability of Kiba components (see #44 for some background).
To use the StreamingRunner
, use the following code:
# activate the new Kiba internal config system
extend Kiba::DSLExtensions::Config
# opt-in for the new engine
config :kiba, runner: Kiba::StreamingRunner
# write transform class able to yield an arbitrary number of rows
class MyYieldingTransform
def process(row)
yield {key: 1}
yield {key: 2}
{key: 3}
end
end
The improved runner is compatible with Ruby 2.0+.
Kiba 2 is expected to be compatible with existing Kiba scripts as long as you did not use internal API.
Internal changes include:
- An opt-in, Elixir's mix-inspired
config
system, currently only used to select the runner you want at job declaration time - A stronger isolation in the
Parser
, to reduces the chances that ETL scripts could conflict with Kiba internal classes
- How do you define ETL jobs with Kiba?
- How do you run your ETL jobs?
- Implementing ETL sources.
- Implementing ETL transforms.
- Implementing ETL destinations.
- Implementing pre and post-processors.
- Live Coding Session - Processing data with Kiba ETL
- Rubyists - are you doing ETL unknowningly?
- How to write solid data processing code
- How to reformat CSV files with Kiba (in-depth, hands-on tutorial)
- How to explode multivalued attributes with Kiba ETL?
- Common techniques to compute aggregates with Kiba
- How to run Kiba in a Rails environment?
- How to pass parameters to the Kiba command line?
Kiba currently supports Ruby 2.0+ and JRuby (with its default 1.9 syntax). See test matrix.
I'm starting to add commonly used reusable helpers in a separate gem called kiba-common, check it out (work-in-progress).
Consulting services: if your organization needs help to implement a data pipeline or to build a data-intensive application, I provide consulting services. More information.
Kiba Pro: for more features & goodies, check out Kiba Pro (Changelog & contact info).
Copyright (c) LoGeek SARL. Kiba is an Open Source project licensed under the terms of the LGPLv3 license. Please see http://www.gnu.org/licenses/lgpl-3.0.html for license text.
(agreement below borrowed from Sidekiq Legal)
By submitting a Pull Request, you disavow any rights or claims to any changes submitted to the Kiba project and assign the copyright of those changes to LoGeek SARL.
If you cannot or do not want to reassign those rights (your employment contract for your employer may not allow this), you should not submit a PR. Open an issue and someone else can do the work.
This is a legal way of saying "If you submit a PR to us, that code becomes ours". 99.9% of the time that's what you intend anyways; we hope it doesn't scare you away from contributing.