How to Measure the Cost and Impact of Each Data Product?
How to measure the cost of each data product? We do it by capturing the flow of data, in autonomous mode.
Our data command center is an observer of a company’s data universe, this is how it works. In the age of cloud, all your data activities lives in machined-generated data such as audit logs, we collect them to see entire value chain. Our intelligence can build your lineage automatically, we just need some annotation to start.
For example, if your organization is using PowerBI, we can connect to its logs and find out all the reports & dashboards that are currently being used in the entire company. Tell our system which business domain some of those reports belong to, that alone will be sufficient for our automation to kick in.
We can find out where the data for this report came from, in this case, the source of “Online vs. Store Sales” was this table called "salesorderheader” from your AWS Redshift warehouse. Our platform registered this table as part of the catalog of the business domain of Sales Analytics. From that point on, any other activity we can find on it will automatically get classified into Sales Analytics.
In particular, the platform found that this table is used in dbt for certain scheduled jobs, that data is then moved into an Azure Postgres database by ADF, and was used to populate another PowerBI report called Sales Performance Report. On the other hand, we discovered that this table was part of 19 tables that were uploaded periodically to the warehouse from a transactional database in RDS, by Fivetran. So those 19 tables in your RDS are also added to this business domain’s catalog. Like a spider weaving its web, our platform built out the lineage across two cloud platforms and an independent data platform, all done on its own.
With the data lineage, we can measure the volume of all the activities in Sales Analytics, and give you the real cost of making the those PowerBI reports to support the sales team.
If you have any questions, please leave a comment below! Next time, I will dive into how we collect the billing data to populate the cost across cloud & data platforms.