Deploy Amazon QuickSight dashboards to observe AWS Glue ETL job metrics and set alarms

November 3, 2023

1

Irrespective of the business or stage of maturity inside AWS, our prospects require higher visibility into their AWS Glue utilization. Higher visibility can lend itself to features in operational effectivity, knowledgeable enterprise choices, and additional transparency into your return on funding (ROI) when utilizing the assorted options obtainable by AWS Glue.

As your organization grows, you must be capable of reply easy questions on your AWS Glue utilization, akin to the next:

The place am I spending probably the most with AWS Glue?
The place can I save probably the most by benefiting from new AWS Glue options?
What does my general utilization seem like utilizing AWS Glue?

AWS presents companies akin to Amazon QuickSight, a serverless enterprise intelligence (BI) service that allows you to centralize this view and even ask pure language questions of your knowledge, utilizing Amazon QuickSight Q. QuickSight can provide enterprise leaders and their know-how counterparts a typical panorama for reporting necessary particulars of their utilization, offering automated narratives to bridge communication gaps.

On this put up, we discover the best way to mix AWS Glue utilization info and metrics with centralized reporting and visualization utilizing QuickSight. This will offer you a extra complete view of your utilization and instruments that will help you dive deep into your AWS Glue job run surroundings. You’ve got metrics obtainable per job run inside the AWS Glue console, however they don’t cowl all obtainable AWS Glue job metrics, and the visuals aren’t as interactive in comparison with the QuickSight dashboard.

Though we don’t cowl optimizing your jobs for prices on this put up, you possibly can confer with Monitor and optimize price on AWS Glue for Apache Spark to discover ways to fine-tune your AWS Glue jobs for efficiency, effectivity ,and cost-optimization.

Let’s dive in!

Answer overview

The next diagram illustrates the structure for the given resolution. At a excessive stage, a scheduled occasion triggers an orchestration move consisting of a number of knowledge, compute, and analytics sources—the output of which culminates as a set of visuals in a BI dashboard.

solution architecture

Now let’s dig into the technical particulars concerned on this resolution.

An AWS Step Capabilities workflow is scheduled to run as soon as per hour by Amazon EventBridge, which triggers an AWS Lambda perform that calls the AWS Glue GetJob and GetJobRun APIs. We parse this knowledge to verify for jobs which have succeeded, stopped, or failed previously hour, in addition to any streaming jobs. The metadata is extracted from every job run, together with info like runtime, begin time, finish time, auto scaling, variety of staff, and employee sort, and is written to an Amazon DynamoDB desk with TTL (time to dwell) enabled to make sure the desk doesn’t develop too giant.

We transfer right into a parallel state to verify two tables that Amazon Athena writes the output of the federated queries to. Athena first checks to ensure the tables exist in Amazon Easy Storage Service (Amazon S3), the place the info shall be saved. If the tables don’t exist, Athena creates them. One federated question gathers AWS Glue metric knowledge from Amazon CloudWatch metrics; the opposite gathers knowledge from the DynamoDB desk the place Lambda writes the AWS Glue job metadata it’s amassing. Each federated queries make the most of applicable filtering to be able to solely scan the required knowledge from every supply.

There’s a selection state for every department. If there is no such thing as a new knowledge to be added to a desk in Amazon S3, the state ends and waits for the opposite to finish. For instance, there might be an AWS Glue job that’s operating whereas the step is evaluating. On this case, the metrics for the job could be inserted within the desk on Amazon S3, however the metadata from DynamoDB wouldn’t arrive till the next hour after the job has succeeded, stopped, or failed.

When new metrics or metadata are discovered, Athena inserts this knowledge to the metrics or metadata tables in Amazon S3, that are each partitioned by the hour. After the info is inserted, the ultimate steps name the QuickSight CreateIngestion API, which triggers knowledge ingestion into QuickSight SPICE to energy interactive evaluation. At this level, the workflow has completed operating and can run once more the next hour.

Within the following sections, we present you the best way to arrange the answer, discover the dashboards, and configure alarms.

The code for this resolution could be discovered on the AWS samples GitHub repository.

Stipulations

It’s best to have the next conditions:

An AWS account with AWS Id and Entry Administration (IAM) privileges ample to create the answer sources
QuickSight Normal or Enterprise Version with a QuickSight consumer created
An AWS Cloud9 built-in growth surroundings (IDE) or your native machine utilizing your most well-liked IDE with the next packages put in:
The AWS Cloud Improvement Equipment (AWS CDK) bootstrapped in your goal AWS account and Area

Deploy resolution sources with the AWS CDK

To provision the sources that construct the dashboard and hold it updated, we offer steps to obtain and deploy the answer by way of the AWS CDK. The answer was developed with cost-optimization as a precedence, however some sources within the stack will incur prices as soon as deployed.

This resolution generates the next sources:

IAM position
EventBridge rule
Step Capabilities state machine
Lambda perform
S3 bucket
Two AWS Glue tables and one AWS Glue database
DynamoDB desk
Athena queries invoked by Step Capabilities
QuickSight knowledge supply, dataset, evaluation, and dashboard

To deploy the answer, full the next steps:

Clone the supply code from AWS samples GitHub repository to the consumer:
```
git clone https://github.com/aws-samples/glue-metrics-in-quicksight
```

Bootstrap your AWS CDK app:

cd glue-metrics-in-quicksight
npm i aws-cdk-lib
cdk bootstrap

Deploy the answer with the required parameters:
1. The primary parameter is for a brand new S3 bucket to be created, which holds the AWS Glue metrics and metadata.
2. The second parameter is required to ensure that QuickSight to assign permissions to the consumer who will handle the property. Discuss with Managing consumer entry inside Amazon QuickSight to search out your current QuickSight customers.
```
cdk deploy --parameters BucketName=New-Distinctive-Bucket-Identify --parameters QuicksightUsername=QuickSight-Current-Consumer
```

In case your deployment fails, be sure you put in the AWS CDK library and rerun cdk deploy after putting in:

The deployment could take as much as 10 minutes.

After the answer is deployed, the Step Capabilities state machine will consider as soon as per hour if it ought to ingest knowledge into QuickSight. You’ll be able to run some AWS Glue jobs after the stack is deployed and verify the QuickSight dashboard within the subsequent hour or two, the place the job metadata and metrics shall be populated to your evaluation.

Discover the dashboard

The dashboard accommodates two sheets: Glue Jobs and Glue Metrics.

The Glue Jobs sheet contains all the metadata about your AWS Glue job runs, together with AWS Glue for Apache Spark, AWS Glue for Ray, and AWS Glue streaming ETL. A lot of the visuals even have a hierarchy you could drill down into with QuickSight, going as little as every particular job run ID. You need to use controls to filter by date, job title, and job run ID.

Within the following demonstration, you will note the pivot desk, which is an easy view of all our job metadata, together with estimated price per job and job run. We open up a job title and see the totally different job runs. There may be one particular person job run that we wish to examine the metrics on, so we select the job title and select View metrics for job run id: <my job run id>. It will take us to the Glue Metrics sheet and robotically filter for the job run ID we wish to view.

glue information sheet

The Glue Metrics sheet is constructed to mirror the documentation we offer in AWS Glue useful resource monitoring. This documentation helps clarify every visible within the dashboard. You need to use the Glue Metrics sheet to view aggregated metrics throughout all jobs, a single job, or all the way down to the job run ID.

To populate the Glue Metrics sheet, your AWS Glue jobs should be enabled to seize metrics in CloudWatch.

glue metrics sheet

Arrange alerts

Organising alerts on measures can also be simple to do in QuickSight. To take action, select (right-click) one of many tracked measures on both worksheet and select Create Alarm. It will deliver you to the configuration web page to arrange the metric you’d prefer to be alerted on.

quicksight alarm

The dashboard is designed to provide the freedom to change it and make your individual visualizations with the metadata and metrics which can be supplied to you. In order for you much more perception into price, contemplate deploying the CUDOS dashboard as nicely!

Clear up

In case you not want the dashboard, delete the CDK app:

Conclusion

On this put up, we talked concerning the significance of getting observability of your AWS Glue jobs and supplied an AWS CDK app that deploys a QuickSight dashboard for you. We hope this helps you optimize your AWS Glue surroundings utilizing the insights the dashboard supplies. To find out about event-based alerting to your AWS Glue for Apache Spark and Ray jobs, confer with Automate alerting and reporting for AWS Glue job useful resource utilization.

In regards to the authors

Michael Hamilton is a Sr Analytics Options Architect specializing in serving to enterprise prospects within the south east modernize and simplify their analytics workloads on AWS. He enjoys mountain biking and spending time together with his spouse and three kids when not working.

Cody Penta is a Options Architect at Amazon Internet Companies and is predicated out of Charlotte, NC. He has a spotlight in safety and CDK, and enjoys fixing the actually troublesome issues within the know-how world. Off the clock, he loves enjoyable within the mountains, coding private tasks, and gaming.

Angus Ferguson is a Options Architect at AWS who’s enthusiastic about assembly prospects the world over, serving to them resolve their technical challenges. Angus focuses on Knowledge & Analytics with a deal with prospects within the monetary companies business.