We are developing a metric store and it will contain two type of metrics.
- Simple metrics
- Formula metrics
Simple metric are aggregations, such as sums and averages etc. using semantic layer, we are facing challenges when scaling for more complex formula metrics.
Formula metrics are metrics that are expressions built on top of other metrics which will be evaluated to form a metrics and a user will feed the formula in the formula builder.
example of formula metrics
CAC Metric = OFFSET((S&M Expenses – Customer Marketing Expenses), -1)/ New Customers
each metric in the formula will have its own dataset its own time dimension
We did our research and arrive at two approaches
- Compose SQL and apply database UDFs
- Compose Dataframes and apply pandas/spark UDFs
enter image description here
enter image description here
In SQL approach we are facing challenges how to store the existing metrics SQL query against the store Metrics and compose the CTEs dynamically
In Dataframes approach we are facing challenges in how to handle different dataframes with time dimensions and compute result.
We are seeking advice, solutions, or recommendations from other consultants, forums, and consultancies on how to effectively implement and scale formula metrics in our system. Specifically, we are interested in best practices, tools, or strategies that could help overcome the limitations we are experiencing with semantic layer in handling these advanced use cases.