Business-driven Data Engineering

Data Engineering

Business-driven Data Engineering

By Sang Kim and Kamesh Raghavendra | March 7, 2025

Shift from Workflow-based to Data Consumption-based Business

The past 3-5yrs have seen a rapid adoption of data-driven decision making in the enterprise at unprecedented operational levels of the business. What used to be merely business intelligence for strategic decision making (e.g. formulating enterprise-wide annual operating and cost plan) has now grown to become data consumption-based business operations (e.g. negotiating with a vendor on a purchase order). This is evidenced by the leading cloud data warehouses (like Databricks and Snowflake) choosing a consumption-based pricing model over a data volume or number of users-based model. This shows that the frequency and the operational depth in the utilization of data (through compute) are the best indicators of value-created in an enterprise.

A quick browse-through of their websites highlights both the key industry verticals and business functions that are rapidly turning data-driven. Leading verticals include financial services, health & life sciences, media & communications, e-retail & CPG and manufacturing. Business functions across sales & marketing, supply-chain & procurement, finance & risk and R&D are embracing data-consumption driven operations.

Data consumption is driving business operations away from traditional and static rules-based workflows, where human operators manually wrangled with limited data and drove the workflows towards decisions. In today’s world, business growth no longer needs cost-prohibitive workflow implementations or large teams of human operators.

Business Growth driven by Efficiency in Data Consumption

The efficiency of data consumption has displaced workflow upkeep and large human operator teams to become the key driver of business growth. This has become quite apparent in the enterprise hiring/staffing trends despite growth in the businesses. This data consumption efficiency builds a consistent competitive advantage across thousands of operational decisions taken in the enterprise every day. Several aspects contribution to building data consumption efficiency of an enterprise – here are some important ones:

Democratizing Data for Business Users: The simplicity, ease of access and the degree of self-service in data consumption by business users define its efficiency. Any complexity in these aspects hinders consumption at operational levels and restricts it to only senior management levels (who have dedicated data engineering teams).

Combining the Macro and Micro Contexts: Data consumption needs to be empowered with both the macro enterprise-wide context and the micro-operational decision-level context to successfully translate into a competitive differentiation for the business. For instance, data consumption for vendor price negotiation needs to effectively put the enterprise-wide trends in costs and margins together with the price-quality sensitivity patterns of the specific vendor to arrive at an intelligent outcome.

Scaling-up Frequency from Strategic to Operational: Operations involves 100,000s of business users making decisions every minute, every hour. This sharply contrasts in scale the traditional strategic decisions made by 100s of execs every month or quarter. Enterprises need a paradigm shift in data infrastructure to scale-up data consumption at operational levels.

Cloud Data Warehouses take the Centerstage

Cloud data warehouses are leading the paradigm shift in data infrastructure needed for scaling-up data consumption for business operations. They are reorganizing the enterprise’s cloud compute infrastructure, and driving the shift from rules-based vanilla workflows to data consumption-based business operations. This makes them radically different from traditional data warehouses of the yester years – in fact, it is misleading to even use the same phrase ‘data warehouse’ to describe cloud data warehouses; and hence the industry has coined new words like ‘lakehouse’. The crux of the paradigm shift is the innocuous-sounding change from the traditional extract-transform-load (ETL) to the new extract-load-transform (ELT) approach.

This paradigm shift is delivered by modern cloud data warehouses by making the following three foundational shifts in their architecture:

Shift from the ETL world	to the ELT world
*Single Sandwiched “Transform” in ETL* Rigid, single, enterprise-wide transformation of data to serve all decision-making needs in one go before loading it in the traditional ETL approach.	*Multiple Business-specific ELT “Transform”s* Multiple, agile, business-specific transformations of pre-loaded data in the ELT approach drives the scaling-up of compute infrastructure for consumption.
Rigid “Load” with Meta Data Schemas Rigid meta data management schemas of traditional warehouses attempted to crunch all business decision needs.	Reliable “Load” with Data Catalogs Pre-loading of data into the cloud data warehouse focuses more on data quality and reliability within the constructs of data catalogs (e.g. Unity from Databricks and Polaris from Snowflake) offering immense flexibility to manage structured, semi-structured and unstructured data.
“Extraction” for Transformation The rigid transformation process drove brittle “data pipes” connecting data sources (e.g. ERP systems like SAP) to traditional warehouses.	“Extraction” for Efficient Sharing with Data Catalog The data quality and reliability requirements of catalogs can be brought to the data sources creating efficient and reliable data sharing directly from the sources (e.g. Delta sharing from SAP to Databricks).

Advent of Business-driven Data Engineering

Data engineering defines, develops and deploys the transformation of data; and has traditionally been an organization within corporate IT tasked with the maintenance of the ETL data warehouse. However, with data consumption driven business operations, the single enterprise-wide data transformation is replaced by multiple business-specific transformations. Hence, data engineering is expanding from being a mere IT function to becoming an important part of business operations. The traditional shared data engineering within IT is now tasked with managing the shared data sources (the new “extract”) and the data catalogs (the new “load”).

The multitude of business-specific data transformations needs a new business-driven data engineering function with distinct features to empower data consumption at scale:

Both enterprise customers and vendors of the modern cloud data warehouses are grappling with the delivery of business-driven data engineering. Shared data engineering teams in IT organizations are stretched thin and struggle to meet the pressing data consumption demands from the business. There is a whole mushrooming cottage industry of mid-sized system integrators, who have emerged as stop-gap intermediary service providers between the business and IT organizations within enterprises. The challenges in scaling-up business-driven data engineering are constraining the utilization of modern cloud data warehouses and denting the data consumption efficiency of the business. This has opened the demand and a new market opportunity for on-demand business-driven data engineering.

In the next blog, we introduce Purgo AI – the industry’s first on-demand data engineering product for the business user – that address this demand and fulfills the vision of data consumption driven business operations.