Run your data operations on a single, unified platform.

  • Easy setup, no data storage required
  • Free forever for core features
  • Simple expansion with additional credits
cross-icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Keboola DEV/PROD Lifecycle via Git

Keboola's virtual branching enables testing without affecting production, supporting Git for enterprise setups.

Developers
July 30, 2024
Keboola DEV/PROD Lifecycle via Git
Keboola's virtual branching enables testing without affecting production, supporting Git for enterprise setups.

Keboola offers a robust virtual branching environment, enabling users to seamlessly create a shadow copy of their entire Keboola Project. This allows for the development and testing of changes without affecting the production environment. These development branches contain a copy of the underlying project data, ensuring that when a pipeline is executed in the development branch, the production data remains unaffected. The user interface provides a straightforward way to merge changes from the branch to production or update the branch with the latest production data.

This branching environment is typically sufficient for small to medium projects. However, in an enterprise setup, it may be necessary to have completely separate environments where both data and data pipeline definitions (code) are isolated. In such setups, administrators may need to define complex “branch protection” rules to closely control who can release new features into the production environment, as well as how and when these releases occur. In the software engineering world, this is often achieved with version control systems like Git.

Thanks to Keboola's CLI functionality, it is possible to define and synchronize separate environments, including the ones with a multi-project architecture setup, entirely via Git. This gives users the freedom to establish deployment rules according to their needs and allows for the testing of entire pipelines across multiple projects in completely isolated environments.

#getsmarter
Oops! Something went wrong while submitting the form.

Model scenario

Let’s say our business works with sensitive financial and client data, and our security requires that only certain users are allowed to access the production data. This means that the development needs to happen on top of test data with the same structure as in production, but anonymized and with no real value. In this scenario, we structure our Keboola projects into three stages (projects):

  • L0: Acquisition
  • L1: Transformation
  • L2: Datamart

The data pipeline runs through L0 to L2 projects, where outputs are consumed by Tableau reporting.  

Aside from these Keboola projects, the entire infrastructure also includes other technologies, let’s say MySQL application database and Tableau reporting. When a new large feature is being rolled out, the deployment is synchronized via Jenkins across all environments at once. 

Let’s consider two environments: PROD and DEV/TEST. The company already has CI/CD pipelines in place that are capable of deploying MySQL and Reporting changes at once through a Jenkins pipeline. All these deployments are also linked to a Jira ticket that belongs to the feature. Once the feature is approved — e.g. all related PRs — the feature is deployed automatically across systems. Our task is to include Keboola Projects in that flow.

Using the CLI we can technically create clones of all three projects with no data and synchronize them at once via Git. The aim is to create two environments:

PROD

Production set of projects

  • Production sources
  • Reporting is linked to a production Tableau instance

DEV/TEST
Development and test environment where new features are developed and when they are ready and tested, they are released (merged) into production. The feature covers three projects so changes in all projects are deployed at once in a single release.some text

  • Contains the same data structure but is linked to anonymized data sources or UAT sources.
  • Reporting is linked to a test Tableau instance.

High-Level Workflow

To implement the above suggested setup, we need the following tools:

  • Keboola CLI: sync project representations with enabled overridden target environment
  • Keboola Variables Vault: a feature that allows users to define variables and secrets on a project level and reference them in configurations.
  • Github & Git Actions: a versioning system to hold the project representations and define deployment rules and validations.
Before every release the state of the Development branch must represent the desired release state. This requires coordination within the team to freeze any work before the Development branch is merged and making sure the Development branch contains only changes required in the release.

Before every release, the state of the development branch must represent the desired release state. This requires coordination within the team to freeze any work before the development branch is merged and ensure the development branch only contains the changes required in the release.

To simplify this process, traditional Keboola Branches may be used to keep the DEV branch clean, as depicted in the diagram above.

Development Workflow

Initialization

We have prepared a sample Streamlit application that can be deployed as a Dataapp in the Keboola environment to help with the initialization process.

This app allows you to define the environment by specifying the names of the environment, a related Git branch, and the Keboola stack.

Set project mappings:

Once you are finished with that, you will be able to generate a zip file with all the GitHub actions that you can use in the repository, using the detailed instructions provided in the GitHub setup:

GitHub Actions

We have prepared a set of example GitHub actions that facilitate the synchronization between environments.

Manual KBC Pull

Sync all projects from the selected environment (set of Keboola projects) into the respective GitHub branch. The sync will result in a new commit in the selected branch that will contain the results of the validations against the destination environment.

Parameters

| Parameter | Description | |-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Branch | Select a branch the action should be executed against. Branch = Environment. So selecting the `dev` branch will initiate `kbc pull` from all projects mapped to the `DEV` environment into the `dev` branch. | | Destination Environment (OPT) | This is an oOptional parameter that selects the destination environment to which you are planning the release. If it is selected, the commit message will include a validation report and checklist that needs to be finished completed before mergeing into the destination environment. |

Usage

Use this action when you wish to synchronize the environment into the GitHub repository, e.g., before release.

If you select a destination environment, the resulting commit will contain a validation report and a checklist that needs to be completed before merging into the destination environment.

Checks

| Check | Description | Optional | |--------------------|----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------| | Secrets Validation | Check whether all secret values are defined in the Vault rather than in the configuration. Runs each time. | NO -> fail when false | | Vault Validation | Check whether the source and destination environment projects have matching Vaults -> e.g. all environments have the same variables defined. | YES -> output a list of discrepancies and include suggested action in the checklist. | | Storage Validation | Compare the source and destination project storages. | YES -> output a list of changes and include suggested action in the checklist. | | Bucket Links | List buckets that are linked in source project and not in destination and vice-versa. | |

Manual KBC Push

Sync all project definitions from the selected environment (GitHub branch) into the respective Keboola Project via `kbc push`. The sync will replace all destination project configurations  with the definitions in the related Git branch.

WARNING: This action will overwrite the destination project states. Make sure you pull the changes first. We do not recommend you use this command unless you are deploying changes via environments.

Parameters

| Parameter | Description | |-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Branch | Select a branch the action should be executed against. Branch = Environment. So selecting the `dev` branch will initiate `kbc push` from all projects mapped to the `DEV` environment into the `dev` branch. |

Usage

Use this action when you wish to synchronize the environment from the GitHub repository into the related Keboola projects, e.g., during release.

Checks

| Check | Description | Recoverable | |----------------------------------|-------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------| | Secrets Validation | Check whether all secret values are defined in the Vault rather than in the configuration. Runs each time. | FAIL | | Vault Validation | Check whether destination environment project has all variables as the source (by analyzing branch) project definition | NO -> output a list of discrepancies and include suggested action in the checklist. | | Check missing linked buckets | | FAIL | | Generate missing storage objects | | |

Github Repository Initialization

  1. Download a generated GitHub Actions zip file:
  2. Create and pull the GitHub repository
  3. Unpack the `git_actions.zip` in the repository root folder
    1. This will create a `.github` actions folder; overwrite if it exists.
  4. Git commit, git push
  5. Create long-lived git branches for your environment (aside from the main one).
    1. E.g., git checkout -b DEV 
  6. Commit and push to each created branch (consider leaving it on GitHub).
    1. This step may be skipped if you perform KBC PULL first on the main branch and then create branches in the Github repository off of that one.
  7. In the repository settings, create the following environments:some text
    1. `PROD`
    2. `DEV`
  1. You may set the ENV restrictions:
    1. Note that the actions require access to both related environments (DEV/PROD) in order to perform comparison validations.
  1. For each environment, set the following:

Environment secrets:

| | PROD | DEV | |---------------|------------------------------|------------------------------| | Variable Name | Value | Value | | SAPI_TOKEN_L0 | Storage token for project L0 | Storage token for project L0 | | SAPI_TOKEN_L1 | Storage token for project L1 | Storage token for project L1 |

Environment variables

| | PROD | DEV | |---------------|------------------------|------------------------| | Variable Name | Value | Value | | SAPI_HOST | keboola.connection.com | keboola.connection.com | | PROJECT_ID_L0 | 1234 | 345 | | PROJECT_ID_L1 | 3456 | 6666 |

Perform initial sync

  1. Go to repository > Actions
  2. Click on action Manual KBC Pull (L0, L1).
    1. Select branch `main` and hit run
    2. Optionally, select a destination branch.
      a. Select branch `main` and hit Run.
      b. Optionally, select a destination branch.
              i. When selected, validations against the selected environment projects will be run.
  1. Rebase new branches onto main.
  2. For each branch (except production), run Manual KBC Push (L0, L1)

Release to production

Before every release, the state of the development branch must represent the desired release state. This requires coordination within the team to freeze any work before the development branch is merged and ensure the development branch contains only changes required in the release.

 

Storage changes

Prior to deployment, the user will need to make sure that all changes in the Storage structure that are not auto-replicable are performed in the destination environment. The suggested supporting GitHub action will generate a checklist of changes for the reviewer to check. These can be applied manually or using the storage merger application in each project in the DEV/TEST; this will then run on Production after deployment.
Note that most of the Storage changes, such as table creations, will be performed by the components (so no action is needed).

  1. Run the KBC PULL action in the DEV environment.
    a. Specify the source branch (development) and a target environment (prod).
    b. This will sync changes into the Git repository

    2. Create a new PR.
        a. The commit message will contain a Markdown report of validation results.
         b. Carry out a code review & compare changes with production (at Git level).

  1. Update the Variables setup and Storage if needed.
    a. According to the checklist produced by the PULL action
  2. Optionally, run PULL operation again to check the validation result.
  3. Merge development to main
  4. If validations were successful, create a release.
    a. This will trigger a Storage structure changes run; push to PROD
    b. Validations are run and operation fails on non-resumable errors are found.

Adding new features / branching

To add new features, we recommend you work in a native Keboola Project GUI Dev Branches (created in the DEV projects). If you have multiple projects and you wish to test the pipeline across all of them, you will need to merge the dev branches first and then run the test in the DEV projects.

Long-term development initiatives

It may happen that some development projects can take a long time to complete. In such cases, it may be worth creating a completely separate environment — e.g. Branch, Set of Projects — and rebase it regularly from production, which may contain minor updates until the release date. This is similar to the Release branching model.

Subscribe to our newsletter
Have our newsletter delivered to your inbox.
By subscribing to our newsletter you agree with Keboola Czech s.r.o. Privacy Policy.
You are now subscribed to Keboola newsletter
Oops! Something went wrong while submitting the form.

Recommended Articles

Close Cookie Preference Manager
Cookie Settings
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. More info
Strictly Necessary (Always Active)
Cookies required to enable basic website functionality.
Made by Flinch 77
Oops! Something went wrong while submitting the form.
>