Gartner® Hype Cycle™ for Data Management 2024
Read The ReportGartner® Data Management 2024
Read The ReportLearn all about distributed storage and eventual consistency.
Distributed systems have unlocked high performance at a large scale and low latency.
You can run your applications worldwide from the comfort of your Amazon Web Services (AWS) platform in California, but the user adding an item to their shopping cart in Japan will not notice any delay or system faults.
However, distributed systems - and specifically distributed database systems - also malfunction.
Because of their multiple components, different levels of abstractions, and paralleled processes that overlap each other, distributed database systems are notoriously hard to debug.
This is why it is important to understand architectural choices when building or deploying distributed systems.
Once you understand what goes on under the hood, it is easier to keep the engines running. One such choice is the trade-off between eventual consistency and strong consistency.
To better understand what is on the balance, we have to first understand how distributed storage is designed.
Distributed storage is achieved via database replication. The data is replicated across several distinct nodes or servers. The nodes communicate with each other through network and data replication protocols that are specific to the database architecture.
Each replica has a copy of (all) the data. And therefore has the resources to serve all the read and write requests client applications send to that node.
With this redundancy (data being kept in multiple copies) distributed databases achieve their advantage: low latency happens because clients can access data closer to them instead of querying across the globe, high performance and high availability are achieved by distributing the query loads across the system instead of burdening just one node, and scaling is made simple and affordable - just add another node.
Distributed storage systems work perfectly well. Until they don’t.
With big data, the volume, velocity, and variety of data that is ingested and processed by the system have increased by orders of magnitude.
Events which previously rarely happened are more common.
“When a system processes trillions and trillions of requests, events that normally have a low probability of occurrence are now guaranteed to happen and must be accounted for upfront in the design and architecture of the system.” - Werner Vogels
Eventual consistency and strong consistency are two design and architecture choices for how to deal with distributed systems at their edge cases - when they malfunction.
Distributed storage systems have three desirable qualities:
The CAP theorem states that a distributed system can guarantee at most two out of three at all times. So for the majority of the time, distributed systems work great. But when they malfunction, we need to make a design choice of which two desirable characteristics to keep.
“Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.” - Werner Vogels
Handling partition cases is the foundation of distributed databases. So we cannot sacrifice the partition tolerance when we talk about distributed databases.
A consequence of the CAP theorem, therefore, is that we either have CP systems, consistent and partition tolerant, or AP systems, available and partition tolerant systems.
Systems that prioritize high availability over consistency will make the system available for read and write requests at all times. Even if the node being queried is out of sync (due to network partitions - read: failures) and it returns stale data that does not reflect the system-wide new updates, the system will respond.
On the other hand, systems that prioritize consistency over system availability will reject read or write requests rather than accept or send data that would be inaccurate with the other nodes in the system.
The two design patterns - CP and AP - are not completely inconsistent. Because partition tolerance is observed, the systems still operate a kind of consistency, despite not being the one specified by the CAP theorem.
CP systems employ a type of consistency called “strong consistency”. While AP systems guarantee a type of consistency called “eventual consistency”
Strong consistency is what is guaranteed by the CAP theorem. Data is consistent across nodes, irrespective of system availability or network partitions. To an outside observer, all new updates to the system look as if done serially or sequentially on a single node. From the outside it does not matter if in reality two new updates are delivered to two different nodes - the system will internally synchronize before showing those updates to read requests, thus making it seem as if a single consistent system is running in the background.
On the other hand, the eventual consistency model prioritizes availability over consistency. It will take any read or write requests to any node, even if the data on the node is not updated with all the other nodes on the network. That means we can have a situation such as the following:
The eventual consistency model guarantees consistency throughout the system, but not at all times. There is an inconsistency window, where a node might not have the latest value, but will still return a valid response when queried, even if that response will not be accurate.
The length of the inconsistency window is usually very short - only milliseconds long - and is determined by the load on the system, the number of replicas involved in the scheme, and the communication delay between nodes.
But the eventual consistency model would prioritize a possible inconsistency window for a short amount of time instead of sacrificing the low latency of highly available systems.
Note - there is also a third type of consistency called “weak consistency”. We do not spend much time on it, since it is not a proper consistency guarantee, but feel free to explore the topic further elsewhere (this is a fantastic resource!)
We should not mistake strong/eventual consistency with the consistency guarantee of ACID databases.
In an ACID-compliant system, transactions - aka changes to the database - have the properties of atomicity, consistency, isolation, and durability.
The consistency guarantees in ACID mean that any transaction executed over the database will leave the database in a valid or consistent state after the transaction has been committed.
The “valid or consistent state” refers to business rules specified as integrity, referential, not null, or other SQL constraints.
For example, if the entity table customers has a not-null constraint over the customer_email field, an operation that tries to insert a new row into the table with customer_email=null will be aborted and rolled back.
On the other hand, strong/eventual consistency refers to data consistency over nodes (not databases) at any given time. Aka, whether different nodes have the same information for a given data item at all times.
How acceptable eventual consistency is, depends on the client application.
The trade-off between the system’s non-responsiveness but strong consistency on one hand versus a highly responsive system with eventual consistency, on the other hand, is purely a business one.
Popular systems have been built with eventual consistency. For example, the Domain Name System (DNS) resolving domain names into web addresses is based on eventual consistency. Without DNS the internet would not run as smoothly as it does.
Generally, people put forward the argument that financial transactions (shopping carts, order processing, etc.) need to be strongly consistent, while product features, like Facebook’s feed, Twitter’s recommendations, etc. do not need to reflect the universally last updated value in the database and can be eventually consistent. Because we can enjoy the Facebook feed even if we do not see the latest friends’ posts. While a sluggishly updated banking account could cause financial problems.
But in reality, even financial institutions often deploy eventually consistent systems with warnings in their terms and conditions stating it might take up to 24h to fully process a transaction.
This is because eventual consistency is consistent for the majority of the time. And the frustration of service unavailability in highly consistent systems usually causes more grumpy customers.
Ultimately, though, the design choice of whether you will deploy a strongly consistent system or an eventually consistent system will reflect your business needs.
Eventual consistency vs strong consistency introduces just one of many data engineering trade-offs you will have to make when designing your data pipelines.
Start your journey towards becoming a (better) data engineer with Keboola’s Data Engineer Certificate, and develop a competitive engineering skillset that will help you stand out.