Data Architecture is hard. First, second and third generation Data Architectures had proven how difficult is to unpack the underlying limitations of each generation.
The Data Architecture Generations
Generation #1 – Proprietary Datawarehouses and BI systems with high budget spending on licensing, that only some can actually handle. ETL is king here. No business intervention here.
Generation #2 – Data Lakes, some of them built to serve “everything”, from operational data to analytical data, like a proper silver bullet. Huge “delivery times” for operational data due the nature of the technology.
Generation #3 – Real time “everything” architecture, cloud service oriented, unifying the operational data with analytical data, trying to tackle the huge cost issues on first generation, while solve some of the challenges that operational real time data brings. Machine Learning is part of there architectures to solve problems and to introduce complexity.
Today being a data engineer is quite hard for several reasons and most of the time, in any data architecture generation the data engineer problems are quite common:
1) They need to consume data from teams who have no incentive in providing “good and easy data”
2) They are disconnected from business source domains that generated the data
3) They sometimes fail to understand the priority and relevance of each data source, either operational or analytical and how it is linked to a specific business domain.
4) They need to master every single tool to handle data (!) that each domain can use.
The result is always the same, frustrated consumers/clients fighting for their priority in the backlog and an over-burned data platform team.
The Data Mesh “thing”
There are a lot of buzz around data mesh, but essentially Data Mesh is one simple thing: Treat domain data as product as a first class concern, and data lake, ETLs and pipelines as a second class concern.
So a data lake (or lake house or even data warehouse) is moved to the seat behind the driver, it is the “implementation”, while the driver is a list of domain oriented data products or services that can play together and deliver your data as a product.
Another Data Architecture?
It is not! At least from my point of view. It is actually only a paradigm shift with the same tools (essentially from the third generation of Data Architectures) but with different priorities with the expected outcome that the data should be the product focused in two different things:
- Domain Oriented Data Services (align with the source or with the consumption)
- Product thinking (Data as a Product)
Domain Oriented Data Services
For today’s post we will talk about the concept of Domain Oriented Services, that allow organizations to move from a Generation of monolith approach to data (Sources = Ingest; Consumers=Serve) to a more Domain Oriented Data where the domains can be aligned in different ways to provide you Data as a Product.
After seeing a few data architectures and the their approach to data, we do agree on the following: Most organizations don’t design data, like a product or a service, and we pointed as green arrow.