Gartner® Hype Cycle™ for Data Management 2024
Read The ReportGartner® Data Management 2024
Read The ReportKeboola's virtual branching enables testing without affecting production, supporting Git for enterprise setups.
Keboola offers a robust virtual branching environment, enabling users to seamlessly create a shadow copy of their entire Keboola Project. This allows for the development and testing of changes without affecting the production environment. These development branches contain a copy of the underlying project data, ensuring that when a pipeline is executed in the development branch, the production data remains unaffected. The user interface provides a straightforward way to merge changes from the branch to production or update the branch with the latest production data.
This branching environment is typically sufficient for small to medium projects. However, in an enterprise setup, it may be necessary to have completely separate environments where both data and data pipeline definitions (code) are isolated. In such setups, administrators may need to define complex “branch protection” rules to closely control who can release new features into the production environment, as well as how and when these releases occur. In the software engineering world, this is often achieved with version control systems like Git.
Thanks to Keboola's CLI functionality, it is possible to define and synchronize separate environments, including the ones with a multi-project architecture setup, entirely via Git. This gives users the freedom to establish deployment rules according to their needs and allows for the testing of entire pipelines across multiple projects in completely isolated environments.
Let’s say our business works with sensitive financial and client data, and our security requires that only certain users are allowed to access the production data. This means that the development needs to happen on top of test data with the same structure as in production, but anonymized and with no real value. In this scenario, we structure our Keboola projects into three stages (projects):
The data pipeline runs through L0 to L2 projects, where outputs are consumed by Tableau reporting.
Aside from these Keboola projects, the entire infrastructure also includes other technologies, let’s say MySQL application database and Tableau reporting. When a new large feature is being rolled out, the deployment is synchronized via Jenkins across all environments at once.
Let’s consider two environments: PROD and DEV/TEST. The company already has CI/CD pipelines in place that are capable of deploying MySQL and Reporting changes at once through a Jenkins pipeline. All these deployments are also linked to a Jira ticket that belongs to the feature. Once the feature is approved — e.g. all related PRs — the feature is deployed automatically across systems. Our task is to include Keboola Projects in that flow.
Using the CLI we can technically create clones of all three projects with no data and synchronize them at once via Git. The aim is to create two environments:
PROD
Production set of projects
DEV/TEST
Development and test environment where new features are developed and when they are ready and tested, they are released (merged) into production. The feature covers three projects so changes in all projects are deployed at once in a single release.some text
To implement the above suggested setup, we need the following tools:
Before every release, the state of the development branch must represent the desired release state. This requires coordination within the team to freeze any work before the development branch is merged and ensure the development branch only contains the changes required in the release.
To simplify this process, traditional Keboola Branches may be used to keep the DEV branch clean, as depicted in the diagram above.
We have prepared a sample Streamlit application that can be deployed as a Dataapp in the Keboola environment to help with the initialization process.
This app allows you to define the environment by specifying the names of the environment, a related Git branch, and the Keboola stack.
Set project mappings:
Once you are finished with that, you will be able to generate a zip file with all the GitHub actions that you can use in the repository, using the detailed instructions provided in the GitHub setup:
We have prepared a set of example GitHub actions that facilitate the synchronization between environments.
Sync all projects from the selected environment (set of Keboola projects) into the respective GitHub branch. The sync will result in a new commit in the selected branch that will contain the results of the validations against the destination environment.
Use this action when you wish to synchronize the environment into the GitHub repository, e.g., before release.
If you select a destination environment, the resulting commit will contain a validation report and a checklist that needs to be completed before merging into the destination environment.
Sync all project definitions from the selected environment (GitHub branch) into the respective Keboola Project via `kbc push`. The sync will replace all destination project configurations with the definitions in the related Git branch.
WARNING: This action will overwrite the destination project states. Make sure you pull the changes first. We do not recommend you use this command unless you are deploying changes via environments.
Use this action when you wish to synchronize the environment from the GitHub repository into the related Keboola projects, e.g., during release.
Environment secrets:
Environment variables
Before every release, the state of the development branch must represent the desired release state. This requires coordination within the team to freeze any work before the development branch is merged and ensure the development branch contains only changes required in the release.
Prior to deployment, the user will need to make sure that all changes in the Storage structure that are not auto-replicable are performed in the destination environment. The suggested supporting GitHub action will generate a checklist of changes for the reviewer to check. These can be applied manually or using the storage merger application in each project in the DEV/TEST; this will then run on Production after deployment.
Note that most of the Storage changes, such as table creations, will be performed by the components (so no action is needed).
2. Create a new PR.
a. The commit message will contain a Markdown report of validation results.
b. Carry out a code review & compare changes with production (at Git level).
To add new features, we recommend you work in a native Keboola Project GUI Dev Branches (created in the DEV projects). If you have multiple projects and you wish to test the pipeline across all of them, you will need to merge the dev branches first and then run the test in the DEV projects.
It may happen that some development projects can take a long time to complete. In such cases, it may be worth creating a completely separate environment — e.g. Branch, Set of Projects — and rebase it regularly from production, which may contain minor updates until the release date. This is similar to the Release branching model.