Understanding OLAP Cubes - A guide for the perplexed

How To

March 4, 2022

5 min read

No items found.

What are they, how do they work, and when should your enterprise deploy an OLAP cube to speed up analytic processes?

Thank you for your registration! We received your contact details. Expect an email with information about the event. We look forward to seeing you there!

Oops! Something went wrong while submitting the form. Try it again please.

Scroll to download

Companies move as fast as their slowest insight.

In the era of big data, information is often produced faster than it can be consolidated and analyzed throughout the Enterprise.

Analysts battle for the limited engineering resources, so they can construct data sets for their analytics endeavors to ultimately provide insights to fast-growing enterprises.

But business intelligence (BI) is often stalled by the limited engineers’ hours or analysts’ lack of engineering skills needed to construct such datasets from raw data.

This is where OLAP cubes come into play.

They are pre-aggregated data structures that allow analysts a multitude of analytic queries without having to constantly ask engineers for their time.

In this guide, we will look at OLAP cubes: what are they, how do they work, and when should your enterprise deploy an OLAP cube to speed up analytic processes.

What is OLAP?

OLAP or OnLine Analytical Processing refers to the architecture and features of databases that specialize in analytical queries.

Unlike OLTP (OnLine Transaction Processing) relational databases, that tune query performance for transactional data (small but fast data writes), OLAP systems specialize in performing complex read queries over large amounts of data.

The OLAP cube is a specialized data structure that results from aggregating data in a useful manner.

What are OLAP cubes?

OLAP cubes are a multidimensional representation of data.

That’s a mouthful.

To understand the multidimensional array of OLAP cubes, let us start simple - with a 2-dimensional representation we all know: Microsoft Excel.

Excel spreadsheets organize data into two dimensions: rows for each entry, and columns for attributes of that entry:

OLAP cubes go one step further and aggregate data across dimensions. For example, an OLAP cube could aggregate total sales amount across three dimensions: city, product, and time.

‍

The data cube allows analysts to quickly get aggregated sales data. For example, it takes only one lookup query to get total sales in London in 2021 ($ 30,000).

Unlike the alternative, writing SQL queries over Excel-like data that multiply the necessary GROUP BY, WHERE, and UNION operations to make the same results happen.

How are OLAP cubes structured?

OLAP cubes aggregate a metric (often called a fact, saved in a fact table) such as “total sales” or “number of purchases” over dimensions.

Dimensions are anything you can group data by. Common dimensions are time (group by year, quarter, month, …), geolocation (continent, region, country, …), product categories, etc.

The OLAP cube usually has three dimensions, but it can be constructed of an infinite number of dimensions (called MOLAP - multidimensional OLAP cube). Each dimension is saved into its dimension tables.

To organize metrics and dimensions in a format an OLAP cube can use to construct aggregate data, we model the original data into fact tables and dimension tables using data modeling paradigms called star schemas or snowflake schemas (because their shape resembles a star or a snowflake).

Dimensions are organized in what we call a concept hierarchy. That is, dimension values can have hierarchical parents and children. For example, the location value for city “London” has a parent dimension value country equal to “UK”.

This allows us to perform aggregations and multiple operations on different granulations of the data cube.

What analytical operations can you perform with OLAP cubes?

OLAP cube supports multiple operations that come in handy in the analyst’s workday.

Slice

Slicing allows us to look at data by fixing (or picking) just one value of a dimension. In the example below, we have sliced the data cube to look at data from only 2004.

Dice

The dice operation is very similar to the slice, but instead of taking a single value for a dimension, we take more than one value in the dimensions we are choosing.

In the example below, we have diced the data cube along the product dimension (picked just “Accessories”, “Outdoor protective equipment”, and “Golf equipment”) while we kept the date dimension and the region dimension intact.

In essence, dicing operations create a subcube of the original OLAP cube.

Drill-down & Roll-up

Drill-down and roll-up allow us to go deeper into the hierarchy of one dimension (drill-down) or aggregate an entire dimension up (roll-up).

In the example below, if we move from left to right, we have drilled down into the “Outdoor protective equipment” to show only its children (“Insect repellant”, “Sunblock”, and “First aid”).

While if we moved in the opposite direction, and aggregated total_sales across “Insect repellant”, “Sunblock”, and “First aid”, we would have rolled the values up into the “Outdoor protective equipment”.

Pivot

The pivot operation is going to ring some Excel bells. Its spreadsheet equivalent is the (in)famous pivot table operation.

The OLAP pivot operation just rotates the cube around its axes, giving us a view that is better equipped for the various reporting and visualizations we are creating.

Advantages and disadvantages of OLAP cubes

OLAP cubes are extremely useful for regular reporting and ad hoc analyses.

By structuring the data into dimensions and facts, OLAP cubes also support a variety of analytic endeavors, such as:

Complex analytics queries
Business intelligence
Predictive scenarios and time-series forecasting
Data mining (in fact, data mining tools often take in data in an OLAP cube format)

However, OLAP cubes have also shortcomings. Depending on the practical implementation, OLAP cubes might either need a specialized server (increases overall operational costs) or need specialized knowledge for implementation and maintenance.

Let’s take a look at the practical implications of OLAP cubes.

OLAP cubes in practice

Historically, OLAP cubes were built on specialized OLAP (or MOLAP or ROLAP) servers. An example of that is Microsoft’s SQL Server Analysis Services (SSAS).

But the industry shift has been to move away from dedicated OLAP servers and into modern data warehouses.

Modern data warehouses utilize three features for OLAP analytics:

Cheap infrastructure to provide scalability at ease and cheap memory and processing.
Massive parallel processing to handle huge data loads.
Columnar data storage to ease access to data (read operations).

With all three characteristics in place, it has become progressively easier to perform OLAP cube queries directly on the databases within data warehouses. Without needing to construct a specialized data structure on its devoted server.

But are OLAP cubes the right solution for your business? Let us take a look at the advantages and disadvantages to better understand the tradeoffs?

Advantages of OLAP cubes

OLAP cubes are extremely useful for regular reporting and ad hoc analyses.

By structuring the data into dimensions and facts, OLAP cubes also support a variety of analytic endeavors, such as:

Complex analytics queries
Business intelligence
Predictive scenarios and time-series forecasting
Data mining (in fact, data mining tools often take in data in an OLAP cube format)

However, OLAP cubes have also shortcomings.

Disadvantages of OLAP cubes

There are multiple challenges to working with OLAP cubes:

Technical hurdles and costs. Depending on the practical implementation, OLAP cubes might either need a specialized server (increases overall operational costs) or need specialized knowledge for implementation and maintenance.
Slow to build and high overhead. OLAP cubes require a lot of design thinking before building them. And they require a devoted person to maintain them. Data changes all the time, and alterations to the upstream data sources often cause OLAP cubes to fail downstream.
Lack of accessibility. Depending on the implementation specifics, OLAP cubes can often be inaccessible to non-technical coworkers. Be it the OLAP special server, or a dialect of SQL they need to master before sales and marketing people could run queries, OLAP cubes are often an “advanced” data structure that works against data democratization and causes data analysts and engineers to be the bottleneck for generating insights for other non-technical co-workers.
Lack of flexibility. OLAP cubes were constructed under the assumption of a “fixed business operations” model. The main dimensions and metrics would not change much and OLAP cubes could always answer the pertinent BI questions. But the reality of data is different. Data and the business processes that data describes are changing and developing at an accelerated rate. From a new data tool that you want to integrate into your customer journey to a shift in business focus that requires a new set of metrics, OLAP cubes do not offer the versatility and flexibility to handle an ever-changing data and business landscape.

So what can be used to compensate for the OLAP cubes’ shortcomings?

Alternatives to OLAP cubes

There are two ways you can compensate for the OLAP cubes' shortcomings.

First, there is software that automates data mining and exploration and makes it accessible to lay users. An example of that is ThoughtSpot. ThoughtSpot allows any users within your company to explore data themselves and discover interesting trends and insights. Without your engineers needing to devote time to build and maintain the underlying data.

Check how you can use ThoughtSpot and Keboola together to get insights faster.

Second, the Modern Data Stack offers a different approach to data engineering and data architecture.

OLAP cubes come from a time when data engineers and developers were the only people who could build metrics and KPIs to track progress across the company. Everyone else depended on them.

With the rise of platforms like Keboola, that empower non-technical people to build data pipelines themselves (check how easy it is) and the move from centralized architectures to data mesh architectures, OLAP cubes are replaced by more flexible data pipelines. Data pipelines that are constructed (and deconstructed) by domain experts to track metrics and KPIs for domain experts.

The Modern Data Stack, with its tools and architectural designs, offers an alternative to OLAP. OLAP is the reliable, old-school BI tool. While the Modern Data Stack is the flexible, new approach to constructing metrics and KPIs.

Get to insights faster with Keboola

Whether you are a die-hard OLAP fan, or want to build a Modern Data Stack, Keboola is here to assist you.

With over 250 integrations between sources and destinations, Keboola can help you automate the engineering you need to create OLAP cubes, data mesh architectures, or even integrate with advanced tooling like ThoughtSpot. Extract, transform, and load data from your sources to the (OLAP) destination of your choice with a couple of clicks.

Try it out.

Keboola offers a no-questions-asked, always-free tier, so you can play around and build your pipelines with a couple of clicks.

Complete the form below to get your complimentary copy.

Oops! Something went wrong while submitting the form.

Run a 100% data-driven business without any extra hassle. Pay as you go, starting with our free tier.

Try Keboola today

Download for Free

Oops! Something went wrong while submitting the form. Try it again please.

Understanding OLAP Cubes - A guide for the perplexed

What is OLAP?

What are OLAP cubes?

How are OLAP cubes structured?