Scorching takes on what we get flawed concerning the metrics layer and the place it matches within the fashionable knowledge stack
The metrics layer has been all the craze in 2022. It’s simply forming within the knowledge stack, however I’m so excited to see it coming alive. Just lately dbt Labs integrated a metrics layer into their product, and Rework open-sourced MetricFlow (their metric creation framework).
A number of weeks in the past, I used to be fortunate sufficient to talk concerning the metrics layer with two most prolific product thinkers within the area — Drew Banin (Co-founder of dbt Labs) and Nick Handel (Co-founder of Rework).
We coated all the things from the fundamentals of a metrics layer and what individuals get flawed about it to real-life use circumstances and its place within the fashionable knowledge stack.
Earlier than we start… WTF really is a metrics layer? Immediately metrics are sometimes cut up throughout totally different knowledge instruments, and totally different groups or dashboards find yourself utilizing totally different definitions for a similar metric. The metrics layer goals to repair this by creating a typical set of metrics and their definitions.
Drew and Nick dove extra into this definition, so let’s leap proper into all of their insights and fiery takes. We talked for over an hour, so it is a condensed, edited model of our dialogue. (Take a look at the complete recording right here.)
How would you clarify the metrics layer to a newbie knowledge analyst?
Because it’s a brand new idea, there’s quite a lot of confusion about what actually the metrics layer is. Drew and Nick lower by way of the confusion with succinct definitions about creating a typical supply of reality for metrics.
Drew Banin: “The shortest model I can consider is…”
Outline your metrics as soon as and reference them in every single place in order that in case your metrics ever change, you get up to date outcomes in every single place you have a look at knowledge.
Nick Handel: “The best way that I’ve defined it to household and people who find themselves completely out of the area is simply, companies have knowledge. They use that knowledge to measure their operations. The purpose of this software program is mainly to make it very easy for the info analysts (the people who find themselves liable for measuring that knowledge) to outline these metrics, and make it straightforward for the remainder of the enterprise to devour that single appropriate approach to measure that knowledge.”
What’s the actual drawback the metrics layer is trying to resolve?
Nick and Drew defined that the metrics layer is motivated by two key concepts: precision and belief.
Nick: “I feel we’re all fairly satisfied concerning the worth of knowledge. We now have all types of various, fascinating issues that we will do with knowledge, and the price of doing these issues is pretty excessive. There’s a bunch of labor to get the info into the place the place we will go and do something that’s actually fascinating and helpful.
“Why does this matter? It’s presupposed to make that complete strategy of getting the info prepared for that supply of worth a lot simpler and in addition extra reliable.”
It comes right down to these two issues: productiveness and belief. Is it straightforward to supply the metric, and is it the correct metric? And may you place it into no matter utility you’re making an attempt to serve?
Drew: “That’s actually good framing. I simply look inwards at our group. The very first metric we ever created was weekly lively tasks — what number of dbt tasks have been run within the earlier seven days? Now we’re about 250 individuals and we’re measuring so many issues throughout the enterprise with plenty of new individuals round.”
We’re making an attempt to ensure that when somebody says ‘weekly lively accounts’ or ‘MRR’ or ‘MRR cut up by handle versus self-service’, all of us imply precisely the identical factor.
Drew and Nick additionally emphasised change administration as each a serious problem and use case for the metrics layer.
Drew: “I feel a lot concerning the change administration a part of it. In case you get the correct individuals collectively, you may exactly outline a metric at that time limit. However inevitably your small business or product will evolve. How do you retain it in sync in perpetuity? That’s the laborious half.”
Nick: “I actually agree with that. Particularly if change administration is going on when there are only some individuals within the room, and different people who find themselves relying on the identical metrics weren’t part of that dialog.”
How ought to we take into consideration the metrics layer, and the way ought to it interaction with different parts of the trendy knowledge stack?
Nick broke the metrics layer down into 4 key parts (semantics, efficiency, querying, and governance), whereas Drew centered on its function as a community connecting a various set of knowledge instruments.
Nick: “The best way that I take into consideration the metrics layer is mainly 4 items. There are the semantics: How do I am going and outline this metric? This could vary from ‘Right here’s a SQL snippet’ or ‘That is the definition of the metric’ to a full semantic layer that has entities and measures and dimensions and relations.
“Then there’s efficiency. Nice, now I’ve this semantic mannequin. How do I am going and construct logic towards it, executed towards some compute surroundings (whether or not it’s a warehouse or only a compute engine on an information lake)?
“Then there’s, how do I question this factor? What are the interfaces that I exploit to tug it out of the info warehouse or knowledge lake, resolve it into this quantitative object that I can then go and use in some evaluation. That features each broad methods of consuming knowledge (like a Python interface or GraphQL or a SQL interface) in addition to direct integrations (a instrument that builds a customized wrapper round a REST or GraphQL API and builds a very firstclass expertise).
“Then the final piece is governance. There’s organizational governance and technical governance. Organizational governance that means, does the finance chief agree on the human-understandable definition of income in the identical approach that the technical one that’s defining the logic defines that code?”
Drew: “Simply to offer an alternate framing: We are able to consider it by way of the expertise for the one that desires to devour knowledge to reply some query or resolve some drawback, after which additionally the individuals constructing the instruments the place these people are consuming the info.
“It’s just a little bit at odds with one another, as a result of the enterprise shoppers wish to see the very same metric in each single instrument they usually need all of it to replace in actual time. So you have got this large community of various instruments that conceivably want to speak to one another. That’s a tough factor to prepare and make occur in follow.
That’s why the concept we name this the ‘metrics layer’ is sensible. It’s a single abstraction layer that all the things can interface with so to get exact and constant definitions in each single instrument.
“To me, that’s the place metadata actually shines. Like, that is the metric, that is the way it’s outlined, that is its provenance, right here’s the place it’s used. This isn’t really the info itself. It’s attributes of the info. That’s the data that may synchronize all these totally different instruments collectively round shared knowledge definitions.”
What metadata ought to we be monitoring about our metrics, and why?
Nick and Drew shared that metadata is vital for understanding metrics as a result of corporations lose necessary tribal information about knowledge outages and anomalies over time as workers adjustments.
Nick: “The metric is likely one of the most constant objects in a company’s life.
Merchandise change, tables change, all the things adjustments. Even the definitions of those metrics evolve. However most companies find yourself monitoring the identical North Star metrics from the very early days. In case you can connect metadata to it, that’s extremely precious.
“At Airbnb, we tracked nights booked. It was necessary from the very early days when BI was actually a printed-off graph that they placed on the wall, and it’s nonetheless an important metric that the corporate talks about within the public earnings calls. If we had been monitoring necessary metadata by way of time of what was occurring to that metric, there could be a wealth of data that the corporate may use.”
They defined that these adjustments are why it’s essential for the metrics layer to work together with each the info layer and the enterprise layer — to seize context that impacts knowledge evaluation and high quality.
Nick: “Airbnb had a giant product launch, and totally different metrics spiked in all totally different instructions. Immediately, I’m undecided {that a} knowledge scientist at Airbnb may actually perceive what occurred. They’re making an attempt to make use of historic knowledge to know issues, they usually simply don’t have that context. If something, they actually solely have context for the final two or three years, when there was anyone who’s within the enterprise who remembers what occurred, who did the evaluation, and so forth.”
Drew: “There’s quite a lot of this that finally ends up being technical — by way of how instruments combine with one another, and the way you outline the metrics and model them. However a lot of it’s certainly the social and enterprise context.
In follow, the individuals which were round for the longest time have probably the most context and doubtless know greater than any of our precise methods do.
“We had a interval the place we had just a little bit of knowledge loss for some occasions we have been monitoring. It appeared like, I feel it was, Might 2021 was the worst month ever. However actually it was similar to, no, we didn’t accumulate the info.
“How would you understand that? The place does that info dwell? Is it a property of the supply dataset that propagates by way of to the metrics? Who’s liable for encoding that?”
What are the actual use circumstances for a metrics layer?
Drew and Nick known as out quite a lot of potential purposes for the metrics layer — e.g. bettering BI and analytics for early-stage knowledge groups, serving to enterprise and knowledge individuals use knowledge fashions in the identical approach, and making precious however time-consuming purposes (like experimentation, forecasting, and anomaly detection) potential for all corporations.
Drew: “I feel a few of the use circumstances round BI and analytics are probably the most clear, apparent, and current for lots of corporations.
Many corporations on the market will not be on the knowledge science and machine studying a part of their journeys but. Issues that make enterprise intelligence and reporting higher (extra exact and extra constant) cowl 90% of the issues that they’re making an attempt to unravel with knowledge.
“Casting our minds ahead, I feel that there may very well be a ton of advantages to leveraging metrics for knowledge science use circumstances.
“Particularly, one of many issues that we’ve seen individuals do with dbt that was actually formative for me — they’d construct these knowledge fashions after which use them each for BI reporting and in addition to energy knowledge science purposes and modeling. The truth that the info scientist and the BI analysts are utilizing the identical knowledge units signifies that it’s much more probably that they’re consuming the identical knowledge in the identical approach. While you prolong it to metrics, there’s like a very pure approach to make that occur too.”
Nick: “I do partly agree with that. But in addition there are quite a lot of knowledge science and machine studying purposes that require very totally different datasets than what a metric retailer produces.
“In analytics purposes, you attempt to embrace as a lot related info as potential. In case you have an ecommerce retailer, individuals can browse it logged out. So that you attempt to dedupe customers and establish as customers log into units. There’s an entire follow of making an attempt to determine which entities are utilizing your service. That’s actually necessary for analytics as a result of it permits us to get a a lot clearer image. However you don’t wish to do this for machine studying, as a result of that’s all info leakage and that can wreck your fashions.
With machine studying, you attempt to get as near the uncooked knowledge units as potential. With analytical purposes, you attempt to course of that info into the clearest and finest image of the world.
“One of many purposes that I at all times take into consideration is experimentation. The rationale we constructed a metrics repo initially was experimentation.
“There have been 15–20 individuals on the info group on the time. We have been making an attempt to run extra product experiments, and we have been doing all the things manually. It was actually time intensive to go and take project logs and metric definitions and be part of them collectively.
Principally, we would have liked some programmatic approach to go and assemble metrics. It’s a massively precious utility for corporations that do it, however only a few corporations have the infrastructure or construct the tooling to do that. I feel that that’s actually unlucky. And it’s most likely the factor that I’m most excited concerning the metrics layer.
“If you concentrate on each knowledge utility as having some value and a few profit — the extra you may scale back the price of pursuing that utility, the extra clearly the justification turns into to pursue some new utility.
“I feel experimentation is one in all these examples. I additionally take into consideration anomaly detection or forecasting. These are issues that I feel most corporations don’t do — not as a result of they’re not precious, however simply because producing the datasets to even get began on these purposes is actually laborious.”
Let’s leap into some questions concerning the metric layer and the trendy knowledge stack.
First, let’s speak bundling vs unbundling. Ought to the metrics layer even be a separate layer, or ought to it’s a part of an current layer within the stack?
As with each debate within the knowledge ecosystem, we ended up simply answering, it relies upon. Drew and Nick defined that how we resolve this drawback is in the end extra necessary than how we outline that resolution.
Drew: “I’m not in love with the way in which that we as an ecosystem discuss new instruments as being layers, just like the lacking layer of the info stack. That’s the flawed framing.
“Folks that construct purposes don’t give it some thought that approach. They’ve providers, and the providers can speak to one another. Some are inner providers and a few are SaaS providers. It turns into a community of linked instruments somewhat than precisely, say, 4 layers. Nobody runs an utility anymore with precisely the Linux, Apache, MySQL, and PHP (LAMP) stack, proper? We’re previous that.
The phrase ‘layer’ is sensible solely insofar because it’s a layer of abstraction. However in any other case, I reject the terminology, though I can’t consider something too significantly better than that.
“The very last thing I’m going to say on bundling and unbundling… For this factor to work, it does have to be an middleman between a really large community of various instruments. Treating it as a boundary like that motivates which instruments can construct it and supply it. It’s not one thing you’d see from a BI instrument, as a result of it’s probably not in a BI instrument’s curiosity to offer the layer to each different BI instrument — which is just like the factor that you really want from this.”
Nick: “I feel I typically agree with that.
Principally, individuals have issues, and firms construct applied sciences to unravel issues. If individuals have issues and there’s a precious know-how to construct, then I feel it’s value taking a shot and making an attempt to construct that know-how and voicing these opinions.
“In the end, I feel that there are good factors there of the connection to totally different organizational workflows. This isn’t one thing that I feel we’ve executed a great job of explaining, however I feel that the metrics retailer and the metrics layer are two totally different ideas.
“The metrics retailer extends the metrics layer to incorporate this piece of organizational governance — how do you get a bunch of various enterprise customers concerned on this dialog, and truly give them a job in one thing that, frankly, they’ve an enormous stake in? I feel that that’s one thing that’s not actually caught on this dialog across the metrics layer, or headless BI, or any of those totally different phrases. But it surely’s actually, actually necessary.”
For a conventional firm that already has an information warehouse and BI layer, the place does the metrics layer match into their stack?
Once more, the reply is that it relies upon — sigh. The metrics layer would dwell between the info warehouse and BI instrument. Nevertheless, each BI instrument is totally different and a few are friendlier to this integration than others.
Nick: “The metrics layer sits on prime of the info warehouse and mainly wraps it with semantic info. It then permits totally different endpoints to be consumed from and mainly pushes metrics to these totally different locations, whether or not they’re generic or direct integrations to these instruments.”
Drew: “It finally ends up being very BI instrument–dependent. There are some BI instruments the place it is a very pure kind of factor to do, and others the place it’s really fairly unnatural.”
If an organization has already outlined a ton of metrics inside their BI instrument, what ought to they do?
Nick and Drew defined that sluggish and regular wins the race whenever you aren’t ranging from scratch. As an alternative of planning an enormous overhaul, begin with one group or instrument, combine a greater metrics layer, and take a look at the way it works to your group.
Nick: “I might advocate for not an enormous ‘change all the things suddenly’. I might advocate for, outline some metrics, push these by way of the APIs and integrations, construct one thing new, probably substitute one thing previous that was laborious to handle, after which go from there when you’ve seen the way it works and consider in that philosophy.”
Drew: “I’m with you. I feel one thing domain-driven makes quite a lot of sense. You possibly can validate it after which develop. I’d most likely begin with… it will depend on your tolerance, however the government dashboard that goes to the CEO. Is that one of the best place to kick the tires? Possibly not. But when it really works there, it’ll work in every single place.”
Can’t a metrics layer simply be a part of a function retailer?
Since Nick has constructed a number of function shops and metrics layers, he had a powerful opinion on this subject — whereas the metrics layer and options retailer are comparable, they’re too essentially totally different to merge proper now.
Nick: “I’ve a very sturdy opinion about this one as a result of I’ve constructed two function shops and three metrics layers. These two issues are completely totally different.
“On the core, they’re each derived knowledge. However there are such a lot of nuances to constructing function shops and so many nuances to constructing metric shops. I’m not saying that these two issues won’t ever merge — the concept of a derived knowledge repository or one thing like that sounds great. However I simply don’t see it occurring within the quick time period.
Everybody desires options to be particular to their mannequin. No one desires metrics to be particular to their group or their consumption. Individuals need metrics to be constant. Individuals need options to be distinctive and no matter advantages their mannequin.
“Actual-time versus batch — it is a tremendous difficult drawback within the function area. Organizational governance is approach necessary for the metrics layer. The technical definitions are sometimes totally different. The extent of granularity is totally different for options — you go approach finer with options than you do metrics.”
Do you consider a caching layer is crucial for a metrics layer?
This was a convincing YES from each Drew and Nick. Caching makes the metrics layer quick, which is crucial for guaranteeing that knowledge practitioners really use it. Nevertheless, it’s necessary that this caching doesn’t replicate knowledge.
Drew: “I feel that the velocity with which you’ll ask a query and get a solution again is actually crucial.
The distinction between one thing taking a minute plus to return again and never coming again in any respect is negligible in quite a lot of circumstances. So, conceptually, I’m very aligned with the concept of caching metric knowledge and with the ability to serve it up actually rapidly.
“I’ll simply say — and I feel we’ve been open about this prior to now — we most likely gained’t do this for V1 of metrics inside dbt. However conceptually, I’m fairly aligned with that being an necessary a part of the system long-term.”
Nick: “Caching is tremendous necessary. Efficiency issues a ton, particularly to enterprise customers. Even 10 seconds is lower than a perfect expertise.
“I feel that there are two necessary nuances to caching. One is, what do I do know forward of time that I need, and the way do I pre-compute that and make that basically snappy? After which if I do compute one thing, how do I then reuse it in order that it’s quick subsequent time? I feel that’s the level of a caching layer.
“The opposite one is, I don’t suppose that caching must occur exterior of the cloud knowledge warehouse or the info lake. I feel that you should utilize these methods. The replication of knowledge, in my thoughts, is simply so pricey and so laborious to handle.”
Lastly, should you have been handed a megaphone and will blast out a message for your complete knowledge world, what would you say?
Drew:
There are quite a lot of issues in knowledge you can resolve with know-how, however a few of the hardest and most necessary ones it’s essential to resolve with conversations and other people and alignment and typically whiteboards. Figuring out which type of drawback you’re making an attempt to unravel at any given time goes that can assist you decide the proper of resolution.
Nick:
I feel the metrics layer is mainly a semantic layer with an extra idea of a metric, which is tremendous necessary. So I might simply say, the metrics layer needs to be backed by a general-purpose semantic layer. The spec and the definition of that semantic layer and the abstractions is so unbelievably necessary.
Facet observe: I’m personally tremendous enthusiastic about how a metrics layer can work together with an lively metadata platform to supercharge information administration for knowledge groups. It’s been tremendous thrilling to see the metrics layer turn out to be extra mainstream, which was a prediction I’d made initially of this 12 months.
Study extra concerning the metrics layer and my six large concepts within the knowledge world this 12 months.
Report: The Way forward for the Fashionable Knowledge Stack in 2022