Gartner® Hype Cycle™ for Data Management 2024
Read The ReportGartner® Data Management 2024
Read The ReportIn this tutorial, we will guide you step-by-step through your Keboola and ThoughtSpot integration.
Follow along to go from zero to search in minutes.
In this tutorial, we will guide you step-by-step through your Keboola and ThoughtSpot integration.
To follow along with this tutorial, you will have to have both a Keboola account and a ThoughtSpot account.
Don’t have one yet? Not a problem!
Keboola has an always free tier (no credit card required), that you can use to play around and learn the platform’s ropes. Head over to Keboola’s webpage and create a new account by filling in the form:
You can do the same with ThoughtSpot. ThoughtSpot offers a generous 30-days free trial (no credit card required). Visit https://www.thoughtspot.com/trial and set up your account from there:
In this article, we’ll work under the assumption that you have a Snowflake data warehouse that you can use with ThoughtSpot. Don’t have one? Don’t worry! We’ll show you how to use Keboola to set up Snowflake in minutes.
Now that you have the tools for the task at hand, let us start extracting data.
We start the integration between Keboola and ThoughtSpot by collecting raw data.
You can use Keboola to extract data from over 200 different sources - anything from Facebook Ads to AWS S3 buckets.
In this tutorial, we will work with the Wine dataset. A CSV file that collects ratings and prices of wines based on their year of production, country of origin, and several other factors. Head to the link above and download the dataset.
Next, log into your Keboola account.
If this is the first time you log in, Keboola will ask you to connect your first data source:
If you already have a Keboola account, go to Extractors (menu bar) and use the search bar to select the CSV extractor.
The configuration wizard will guide you through the data upload:
Side note: Keboola offers over 200 automated data extractors. But what happens if you use a data source that’s not covered by Keboola’s ecosystem of automated data collection apps? We’ve got you covered.
With the low-code Generic extractor, you can build your data collection app for any RESTful API. Just dust off your coding keyboard and get typing! Unusual API? Also not a problem. Sign up to our developer portal to create your own components, or leverage our partner ecosystem to have one developed for you!
Data cleaning is an integral part of any data pipeline. It allows you to validate and sanitize data before analyzing it.
Whether you want to remove some data, aggregate it by dimension, or pick just the rows of your choice, you can transform your data within Keboola before connecting it to ThoughtSpot.
For this tutorial, we will remove the wines from countries that had fewer than 50 wine ratings to avoid getting swayed in later analytics by small-number outliers.
Go to Transformations > New transformation and pick the language of your choice to do the data transformations (Snowflake SQL, Python, or R). For this example, we will pick “Snowflake SQL Transformation”:
Name your transformation “red_wine_countries_50_reviews_min” and give it a memorable description. Click “Create transformation”.
Configure the transformation with the following parameters:
CREATE OR REPLACE TABLE "cleaned_red_wines" as (
WITH countries_under_50_reviews as (
SELECT "Country"
FROM "red_wines"
GROUP BY "Country"
HAVING count(*) < 50
)
SELECT
"Name" as wine_name
, "Country" as country_of_origin
, "Region" as region_of_origin
, "Winery" as winery
, "Rating" as rating
, "Price" as price
, "Year" as year_of_production
FROM "red_wines"
WHERE "Country" NOT IN (SELECT * FROM countries_under_50_reviews)
)
The data you just cleaned can be found under Storage > cleaned_wines > cleaned_red_wines.
It is time to share the data to ThoughtSpot so we can analyze it.
Before connecting your data to ThoughtSpot, you will have to configure the ThoughtSpot integration.
If you have a Falcon-based ThoughtSpot account with its own data processing engine, the configuration will be different - proceed to 3.2 instead.
ThoughtSpot Cloud requires you to have a Cloud Data Warehouse (CDW) instance up and running, from which you can connect ThoughtSpot to your data. Keboola supports all the major cloud data warehouses and the integrations look similar, so if you use Snowflake, Redshift, or BigQuery already, you can easily land the data there to be later integrated with ThoughtSpot.
If you don’t have your own cloud data warehouse, do not despair, Keboola can provision one for you - we’ll show you how in the next steps.
Snowflake
What follows is the Snowflake example (also used if you don’t have a data warehouse yet). For other CDWs, just select the appropriate data destination - the remaining steps are the same.
In Keboola, go to Components > Directory > search for Snowflake > click on the “Use this” button to select the Snowflake data destination.
You will be redirected to the configuration page for the Snowflake Data Destination component. Click “Set up credentials”.
Keboola will offer you two options to set up a Snowflake database:
Here we select the “Keboola Snowflake database” that provisions a Snowflake instance for us:
The next screen will show you the credentials Keboola created for you. Make sure to copy them (especially the password, you won’t see it again). You will use the credentials later when connecting to ThoughtSpot.
Go one step back to the configuration of your Snowflake Data Destination. You will write your wine data to the newly made Snowflake database in three steps:
Keboola will now offer you the choice to further specify the details of how your data should be written to Snowflake (load type, primary keys, data types of fields, etc.). Feel free to play around. Once you decide how the data should be written to Snowflake, click “Run component” on the right-hand side.
Keboola has now written your red wine data to Snowflake. Time to connect the Snowflake instance to ThoughtSpot, so you can see your data there.
With the Snowflake credentials at hand, head over to your ThoughtSpot account and follow along with their onboarding wizard.
Step 1: Select “Snowflake” as the data source for ThoughtSpot and name it something memorable:
Step 2: Insert the connection details from Keboola into ThoughtSpot. Every field is named the same across the two platforms, except for ThoughtSpot’s “Role”. Use the value from Keboola’s “Schema” for ThoughtSpot’s “Role” (in the example below, the value is “WORKSPACE_11263678”).
Step 3: Select the data you want to work on in ThoughtSpot. In the image below, we selected the cleaned wine dataset.
In this step, you can also adjust the data types. For example, the columns Year, Price, and Rating should not be VARCHAR.
And voila! After finishing “Create connection”, your wine data is available in ThoughtSpot for you to build Liveboards and SearchIQ queries.
In Keboola, go to Components > Directory and search for the ThoughtSpot data destination.
The wizard will guide you through the ThoughtSpot’s component configuration (just in case, detailed instructions are here).
Next, select the data tables that will be available to ThoughtSpot.
For each column, you can additionally specify the name of the column as you want it to appear in ThoughtSpot and the associated data type.
Finally, you can click on “run component” to connect your data to ThoughtSpot immediately, or “automate” to set up scheduled data connections.
Once properly set up and configured, you can let your Data Stack do the heavy lifting for you.
Simply click “Automate” next to each component (image above), and the entire end-to-end data stack will automatically run on a schedule.
The tutorial used the simple wine CSV example to quickly showcase how to integrate Keboola and ThoughtSpot. But you could integrate and automate multiple data pipelines - anything from extracting Facebook Ads data for marketing reports to ML analyses of custom JSON files on Amazon S3.
If you have not already, check out what Keboola has to offer with its no-questions-asked, always-free tier, and start building automated end-to-end data pipelines that will make your data efforts shine without additional engineering sweat.