Friday, September 8, 2023
HomeBig DataGroup 1001: An Lively Metadata Pioneer - Atlan

Group 1001: An Lively Metadata Pioneer – Atlan


Demystifying the Lively Metadata Administration Market

The Lively Metadata Pioneers sequence options Atlan clients who’ve not too long ago accomplished an intensive analysis of the Lively Metadata Administration market. Paying ahead what you’ve discovered to the following knowledge chief is the true spirit of the Atlan group! In order that they’re right here to share their hard-earned perspective on an evolving market, what makes up their trendy knowledge stack, modern use circumstances for metadata, and extra!

On this version, we meet Gu Xie, Head of Information Engineering at Group 1001 and two-time consumer of Atlan, who explains Group 1001’s distinctive construction and the way that impacts their knowledge wants, his hard-earned perspective on the lively metadata administration market, and the way he’ll use Atlan to drive productiveness and readability throughout his group.

This interview has been edited for brevity and readability.


Would you thoughts describing Group 1001 and your knowledge workforce?

Our group is the info engineering workforce. Group 1001 is an insurance coverage holding firm that truly is an umbrella firm of a number of totally different manufacturers, together with Delaware Life, Gainbridge, Clear Spring Life and Annuities, and a number of other others.

What we’re centered on inside our workforce is the annuity aspect of the enterprise. So we instantly interface with our core coverage administration system for provisioning and dealing with the entire annuities enterprise. Our engineering workforce is answerable for making certain that we are able to present analytics, whether or not it’s on the info that’s inside the annuity aspect of the enterprise to our operations workforce, or from a gross sales perspective, or from a advertising and marketing perspective. 

Every enterprise is just a little bit totally different. Gainbridge is a direct-to-consumer enterprise model, whereas Delaware Life revolves round a extra monetary advisor-level enterprise the place we’re doing extra of B2B2C. So two totally different companies, totally different manufacturers, totally different merchandise, however we’re offering the breadth of analytics throughout these views.

And the way about you? Might you inform us a bit about your self, your background, and what drew you to Information & Analytics?

I’ve been working in knowledge engineering and knowledge & analytics because the very begin of my profession. I’ve been on this trade for… gosh, I believe it’s about 11 plus years now. 

Proper out of school, I had a very good alternative to dive into the world of CRM, however ended up doing something however CRM and centered extra on the info itself. Whether or not it’s constructing out enterprise intelligence, doing report migrations, doing knowledge migrations, tons of labor by way of main knowledge warehouse groups, in addition to main and driving the modernization of contemporary knowledge & analytics platforms as organizations moved to the cloud. That’s the place I’ve constructed my core competency; actually enabling and stitching collectively this contemporary knowledge stack for a company, such that they’ll get actually complete knowledge & analytics capabilities with out hiring an enormous workforce.

So I’ve performed this earlier than in my prior group with a workforce of 40 plus engineers. In that group, we selected then carried out a conventional knowledge catalog, however spent a ton of engineering hours integrating it, then had bother getting it adopted by shoppers and stewards. We weren’t very pleased with it. Then we migrated to Atlan and had a lot better luck activating the info stack all of us constructed collectively.

Right here at Group 1001, we’re capable of construct a whole end-to-end knowledge analytics platform in underneath 10 months with a workforce of 4. That simply goes to indicate, if in case you have a very sturdy psychological mannequin of this contemporary knowledge & analytics stack, and understanding the place your group might want to match and piece issues collectively, you don’t have to have an enormous engineering workforce. You possibly can have a very small workforce that may actually construct and allow this. 

We’re leveraging quite a lot of CI/CD and automation, and on the similar time, are capable of get the advantages of the trendy knowledge stack, which is unbelievable end-to-end velocity from concept to perception. That’s the point of interest of the imaginative and prescient: Thought-to-insight, and getting velocity there.

What does your stack appear like? 

We’ve got knowledge sources whereby knowledge resides in databases, file logic storage, SaaS purposes like Zendesk, Google Analytics, and Salesforce. We’ve got APIs, whether or not it’s inner APIs or occasions and logs. 

The best way we began with this tech stack, we constructed round Snowflake as our core knowledge platform. We had been on GCP, so we did in depth POC between BigQuery in addition to Snowflake, and ended up selecting Snowflake. 

Then we ran right into a scenario whereby, “Okay, we have to replicate our knowledge into Snowflake,” as a result of up to now we had been constructing ETL pipelines ahead into Postgres initially, and it simply doesn’t scale. So we leveraged Fivetran as each our CDC replication in addition to SaaS replication. So we are able to entry the info from the database aspect of the fence, in addition to faucet into all of the totally different SaaS purposes that Fivetran helps. So we are able to onboard Google Analytics, Zendesk, Google Adverts, in addition to Salesforce knowledge onto Snowflake to have that holistic centralization of all of our knowledge and belongings.

Then we additionally went down the trail of, “We have to mannequin and form this knowledge so we could be available for analytics and unify the info mannequin throughout our numerous traces of companies.” So we introduced in Coalesce as a result of that gave us the size, the standardization, the automation that we’d like with a purpose to create the info fashions and form them for consumption. On prime of that, we introduced in Dagster as an orchestrator to totally substitute Airflow. After establishing the infrastructure, one week later, three days after that, we migrated all 73 DAGs over to Dagster from Airflow. That was simply big.

We then even have Soda for constructing numerous knowledge high quality guidelines to make sure now we have all of the monitoring in place, and what the standard standards are, and integrity, completeness, freshness, these sorts of elements. We use Soda to allow our workforce to construct high quality guidelines. After which the place Atlan comes into the journey. We see it as a part of our knowledge administration suite. Soda from a high quality monitoring perspective, in addition to Atlan to allow knowledge discovery.

So an engineer, or an analyst, or perhaps a enterprise consumer can discover out what knowledge now we have within the group, who owns it, what it means, when it was final refreshed, and if it may be trusted. And in addition the place is it getting used and the way is it being sourced? Atlan supplies that holistic image of that journey. 

By way of the analytical outputs, we use PowerBI in our present reporting platform. We additionally introduced in Sigma for embedded and exploratory analytics use circumstances.

Why did you want an Lively Metadata resolution?

That’s the toughest promote: “Why do we’d like a catalog resolution? Why do we’d like an Lively Metadata resolution?” 

And the best way I strategy this downside is simply as a result of underlying want. Information is all the time going to develop 2X each two years. That’s been the trade pattern because the Seventies. Information grows twice each two years.

So the issue that I see is as extra knowledge grows, there’s extra metadata of that knowledge, and that may very well be within the type of extra database objects that you just’re going to create, extra recordsdata that you need to course of, extra sources that they ingest. Particularly while you embody extra programs that you need to help, extra BI instruments that you need to allow, extra something. Take into consideration that, doubling the info. The metadata is a magnitude-like issue on prime of that.

One of many greatest struggles in any knowledge workforce is answering inquiries to and from a enterprise consumer perspective, “How do I discover X, Y, Z knowledge? The place do I get this? The place do I discover this report?” And even when knowledge groups do have that, they’ll ask, “Properly, the place’s it coming from? How do I get the underlying element of that info?” 

And when one thing goes incorrect, which it inevitably will, “How do I troubleshoot that?” And my expertise is that if there’s one little column on that report in PowerBI that’s damaged, a consumer will come and ask me, “Okay, what occurred?” 

And I don’t know, so I’ve to dig in. So that you open up the report, and it’s an archeological train to excavate from the report back to the pipelines, to the info units, to the web supply knowledge to determine that out. 

That’s all the time been a problem. And that in my view, is the true technical debt that weighs on each single knowledge workforce on the market. It’s the truth that there’s by no means a great way of dealing with that metadata. And it rears its ugly head, similar to each tech debt does, within the type of the workforce spending 80% of their time doing this, answering questions concerning the knowledge, determining how folks get entry to knowledge, and troubleshooting.

I’ve seen the info groups can spend upwards of 80% of their time in reactive mode. And in the event you common it out, I’ve seen it’s normally a few good 40% or 50% of their time is spent answering questions. And that could be a elementary sink throughout all developer productiveness within the group. 

How do you get extra velocity? That’s the place Atlan comes into play. Possibly we are able to allow a enterprise consumer to reply the query themselves, or somebody like a knowledge analyst would have the ability to reply a query with out involving engineering groups. 

An engineering workforce can then deal with what they’re actually presupposed to do: Purchase extra knowledge, allow extra insights, and sit down with the enterprise customers that may assist collaborate in that dialogue about, “Hey, I’ve this concept, how do I allow this perception?” Reasonably than spending time answering the query of, “What went incorrect right here?” In order that’s the best way I see it, that’s the necessity, and to promote that want could be tough.

I introduced in Atlan as a result of it can assist our workforce be higher at dealing with knowledge. As soon as we onboard Atlan, that’s the productiveness I wish to get to, groups spending much less time answering questions, and spending extra time collaborating on knowledge. 

We’re additionally utilizing Atlan as a manner of making an authoritative set of datasets so customers would know which knowledge they’ll belief and use. We’re increasing our workforce to collaborate with different enterprise teams such that they’ll self-service their knowledge analytics and Atlan can be key to allow the collaboration mannequin between engineering and enterprise.

What made Atlan stand out out there to you and your workforce?

Right here’s the issue that I see within the market. Each single catalog resolution appears centered on simply the catalog, or they deal with different product traces which might be extensions of the catalog. Within the case of conventional knowledge catalogs like Alation, they deal with the truth that, “Hey, you’ll be able to democratize knowledge stewardship throughout the group. Your complete group may very well be stewarding knowledge.” That was the genesis of it. So it’s the Wikipedia strategy of information stewardship. 

The truth is, there’s no workforce on the market that has a knowledge steward. Possibly in a big group you’ve got a couple of of them, however that’s not a job that you just wish to rent. What’s the worth add, what’s the ROI for the info workforce, or from a knowledge governance perspective?

Prior to now, I labored at a big Monetary Providers agency, and we skilled all of the challenges concerned with a conventional catalog. We’d spend a ton of engineering hours integrating to our present programs, after which we would wish a military of information stewards to construct and preserve every little thing.

The truth with this strategy is that you just’re forcing knowledge stewardship throughout each group and so they simply don’t have the bandwidth to do it. That’s why I noticed an enormous retraction from Alation, with folks going to make use of Confluence pages as a result of it’s simply simpler to edit Confluence than to replace a catalog. 

So I knew there needed to be a greater strategy to this downside, and that’s after I got here throughout this text about “Information Catalog 3.0” by Prukalpa, and I used to be intrigued by this new strategy. And I selected Atlan not simply now, for Group 1001, however again in my earlier position, too.

So one of many predominant the reason why I selected Atlan is that Atlan is targeted on a really sturdy mission. That’s the core of it. Sure, it’s Lively Metadata Administration, however the actual kicker of that’s Atlan’s imaginative and prescient is knowledge collaboration between engineering, analysts and enterprise groups.

Alation shouldn’t be that. Their enterprise mannequin is to catalog the info of their system, and that manner they may promote you on the Composer (a SQL editor). That’s the bread-and-butter moneymaker, from what I’ve seen. Their core product of enabling the cataloging resolution? They’ve by no means improved, and so they deal with Composer. I didn’t like that from a product improvement perspective. 

And with Atlan, I see their journey is de facto enabling collaboration with knowledge, whether or not it’s simplifying the quantity of labor from an engineering perspective to onboard the varied knowledge instruments into Atlan. Or if it’s from an analyst perspective, with the ability to see the web knowledge units, see the lineage and leverage it, understanding the place a dataset has been, or integrating Slack to allow that communication about knowledge throughout the group.

In order that’s what I focus extra on, primary, is the product imaginative and prescient and what their predominant mission is. And secondarily, on prime of it, is simply seeing the proof within the pudding, the developer velocity.

I do know that in my earlier group we spent a ton of engineering hours to combine our present programs to a conventional knowledge catalog. With Atlan, I used to be capable of get Group 1001 up and operating in underneath two hours. So simply the developer velocity of not having to spend all that point configuring and constructing integrations as a result of Atlan has out-of-the-box integrations to quite a lot of the core trendy knowledge stacks? That is big. 

We may focus extra on the higher-value ask, and the higher-value ask is to allow higher collaboration inside the group round knowledge. That’s the actual motive why I selected Atlan.

What do you plan on creating with Atlan? Do you’ve got an concept of what use circumstances you’ll construct, and the worth you’ll drive?

The use case that now we have Atlan utilizing proper now shouldn’t be the one use case that we finally wish to construct sooner or later. And the explanation why is true now, we’re actually centered on our core analytics stack, which entails Snowflake, Fivetran, Coalesce, Dagster, and the like. Certain, Atlan will resolve that, however how can we lengthen Atlan throughout the enterprise? So enabling cross-enterprise knowledge governance, a holistic view of our enterprise’s knowledge belongings, monitoring PII and making use of governance and insurance policies associated to it. 

Any new enterprise that we’re onboarding can include their very own knowledge stack. So one of many core elements from a knowledge technique perspective, is that we are able to leverage Atlan as a central governance framework. That every one organizations will publish knowledge belongings into Atlan to have one, holistic umbrella.

One other key use-case is enabling self-service of analytics throughout our group. We plan to leverage Atlan to doc our newly curated knowledge so different departments can uncover, perceive what the dataset is, how you can use it, and whether or not they can belief the knowledge. This can be key to facilitating the collaboration with knowledge and enabling our group to be knowledge centric.

Photograph by Benjamin Baby on Unsplash



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments