Keboola is proud to announce the addition of dbt to the transformation layer. Now anyone who knows SQL can build production-grade data pipelines by embracing the software engineering approach.
So what exactly is dbt, and more importantly, why is everyone talking about it?
dbt (Data Building Tool) is an open-source tool that simplifies data transformation following software engineering best practices like modularity, portability, CI/CD, and documentation.
dbt empowers engineers and analysts to transform data in the warehouse through SQL code, which it then converts to models (datasets). The Modern Data Stack architecture has adopted this standard as best practice.
Firstly, let's talk about 7 reasons why companies choose dbt to be part of their Modern Data Stack:
Engineering principles. By implementing dbt engineers also implement software principles in the data world. Features like testing, code review, ability to control change management by CICD do contribute to controlled change management practice known in the software engineering world.
SQL. Plain and simple, dbt is a great tool, because you can run many data engineering tasks using the well-known SQL language. By writing code in SQL, modularizing it into models, and using templating engines like Jinja you can write and reuse the same transformation logic in multiple places.
Out-of-the-box testing. Deploy tests in SQL alongside your transformations to assert data integrity, referential constraints, and semantic validity.
Documentation. Document the code as you write it to create a knowledge repository of all your data models, including data lineage.
Versioning. Integrate dbt with git to keep track of all data model changes. You can always revert to the previous version if you mess anything up.
Deployment. Deploy dbt transformations as part of your CI/CD pipelines. Even though it is written with SQL users in mind, it can be deployed under the DevOps practices of job orchestration.
Production-ready. Organize your work with dbt’s repositories so you can set up a development, staging, and production environment, to run transformations under the same production-level standards as other software projects.
The benefits of dbt might seem abstract, but for data practitioners, who like to get their hands dirty with the nitty-gritty of data work, dbt speeds up the entire process of creating and launching transformations in their ETL and ELT pipelines, while reassuring the data quality standards.
dbt is also not the only transformation engine Keboola supports. Within its transformation backend, you can run data cleaning and aggregation tasks with SQL, Python, R, Julia and other code.
By integrating with both dbt Core and dbt Cloud service, Keboola gives your analytics engineers even more tools and options to make their work even smoother.
While also allowing them to mix-and-match other languages and frameworks that are better suited for other problems (e.g. Python or R for data science).
But Keboola, as always, takes it even a notch further. So let's finally get to the 7 reasons why we integrated dbt - as we hinted in the headline!
#getsmarter
Oops! Something went wrong while submitting the form.
Run a 100% data-driven business without any extra hassle. Pay as you go, starting with our free tier.
Keboola helps you get more engineering value from dbt
By using Keboola+dbt you can achieve more than just integrating dbt into your end-to-end ETLT data pipelines:
Easy setup of development environment. Keboola provides development environment setup on one CLI command. This allows users to start developing fast and safe - Keboola provisions SQL workspaces with read-only access to all existing project data and isolated development environment, accessible from any SQL editor, dbt cloud or Keboola UI. Furthermore, CLI simplifies the creation of data source definition and automatic tests thus saves time of developers.
Ability to orchestrate dbt in the context of full data pipelines. Orchestrator is able to schedule and run all components needed in full data pipeline - extractors (data loaders), transformation code (dbt, Python, R, Julia, etc.)
Data governance - Access job monitoring and metadata as first-class citizens. Keboola creates immutable job logs for every transformation run, so you can include the metadata of your operations into your analytic modeling. Full-observability without any extra costs. Furthermore, Keboola platform stores all artifacts into the built-in metastore and provides insights based on them - such as model timing, dbt docs, compiled SQLs etc.
Fine-grained control of running transformations. With Keboola, you have more control over when and how you launch your dbt jobs. The Orchestrator sets your transformation jobs on a schedule and Cloud triggers help you run event-driven transformations. The separation of execution and modeling gives you more control and insight into how your transformations are running.
Separation of execution, testing, and modeling. With the combination of Keboola development branches and the ability to use different dbt code branches, you can separate the development (modeling), testing, and production environments and validate results of test runs.
Scalable infrastructure without the DevOps nightmares. Keboola can help you scale the infrastructure running your transformations without you needing to worry about the maintenance and operational hiccups. Customize backend resources with a couple of clicks, run multithreaded transformations without coding them yourself, and remove bottlenecks by defining incremental or bulk output behavior of your transformations.
Simplified but powerful collaboration. Sharing data, models, and code is a problem of the past. Keboola allows you to develop rule-based collaboration permissions that simplify sharing of assets at any granulation you need (job, dataset, project, development branch, or entire ecosystem). Headaches with work handoffs between teams are a thing of the past.
Keboola + dbt help you streamline data operations while keeping the entire operations observable and customizable.
But what if you wanted to break down silos and empower your non-engineers with data as well?
Make the most out of your Modern Data Stack
dbt is tailored to the “software engineering” type of persona. The professional who likes to get their hands dirty into the nitty-gritty aspects of data as code.
But Keboola, as THE Data Platform as a Service, is big on data democratization. The idea that anyone can play with the Modern Data Stack and contribute to the data operations:
Run data as code with the Keboola-as-code features, such as the dbt integration, CI/CD pipelines, and other features tailored to the engineering professionals.
Automate your data pipelines via API. Keboola makes machine-to-machine automation easy, without having to re-invent the REST wheel.
Empower your domain experts to build data pipelines and utilize their domain knowledge to the fullest, even when they’re not fluent in code. The Visual Flow Builder helps them build ETL pipelines through a drag-and-drop interface and no-code transformations (integrated with dbt) help them self-serve their analytic needs in a couple of clicks.
Democratization of data access is just one of the many advantages. With Keboola you get:
A Data Platform as a Service. You can pick just some tools, the entire toolbox, or bring your own tools. The plug-and-play design is fully customizable and extends with your data needs.
Tool integration. The single platform allows you to connect the dots between your different data silos. No need to duplicate work with a separate DevOps for data science and one for integration engineering. All your tools can be run within a single platform. Helping you bridge silos between departments and foster collaboration.
End-to-end observability and data governance. As your operations grow, so do your monitoring and governance pains. With data stacks covering a dispersed array of tools it is hard to see what is failing technically (monitoring), and what processes are failing (governance). With Keboola, you can monitor and trace lineage throughout your stack, as well as enforce data governance principles and policies to keep your operations regulatory compliant.
Security by design. When the modern data stack grows, so does your security risk. Each tool requires you to give, monitor, and revoke permissions and access within their own security ecosystem. With Keboola, you can bring all the security concerns under a single roof, and quickly administer your tooling from a unified and secure command center.
Upgrade your Data Stack with Keboola + dbt
Build, run, and scale end-to-end data pipelines with Keboola. Deploy your data transformations with one of the best tools on the market - dbt. All along without increasing complexity, losing observability, or opening yourself up to regulatory risks.
We use cookies to make Keboola's website a better place. Cookies help to provide a more personalized experience and relevant advertising for you, and web analytics for us. By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. To learn more about the different cookies we're using, check out our Cookie Policy
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. More info