AWS Glue is a serverless information integration service that makes it simpler to find, put together, and mix information for analytics, machine studying (ML), and utility improvement. You should use AWS Glue to create, run, and monitor information integration and ETL (extract, rework, and cargo) pipelines and catalog your property throughout a number of information shops.
Some of the widespread questions we get from prospects is methods to successfully optimize prices on AWS Glue. Over time, we now have constructed a number of options and instruments to assist prospects handle their AWS Glue prices. For instance, AWS Glue Auto Scaling and AWS Glue Flex can assist you scale back the compute value related to processing your information. AWS Glue interactive classes and notebooks can assist you scale back the price of creating your ETL jobs. For extra details about cost-saving finest practices, consult with Monitor and optimize value on AWS Glue for Apache Spark. Moreover, to grasp information switch prices, consult with the Price Optimization Pillar outlined in AWS Nicely-Architected Framework. For information storage, you’ll be able to apply normal finest practices outlined for every information supply. For a value optimization technique utilizing Amazon Easy Storage Service (Amazon S3), consult with Optimizing storage prices utilizing Amazon S3.
On this submit, we deal with the remaining piece—the price of logs written by AWS Glue.
Earlier than we get into the fee evaluation of logs, let’s perceive the explanations to allow logging to your AWS Glue job and the present choices out there. Whenever you begin an AWS Glue job, it sends the real-time logging info to Amazon CloudWatch (each 5 seconds and earlier than every executor stops) through the Spark utility begins working. You possibly can view the logs on the AWS Glue console or the CloudWatch console dashboard. These logs give you insights into your job runs and assist you optimize and troubleshoot your AWS Glue jobs. AWS Glue presents a wide range of filters and settings to cut back the verbosity of your logs. Because the variety of job runs will increase, so does the amount of logs generated.
To optimize CloudWatch Logs prices, AWS not too long ago introduced a brand new log class for sometimes accessed logs referred to as Amazon CloudWatch Logs Rare Entry (Logs IA). This new log class presents a tailor-made set of capabilities at a decrease value for sometimes accessed logs, enabling you to consolidate all of your logs in a single place in a cheap method. This class supplies a more cost effective choice for ingesting logs that solely must be accessed often for auditing or debugging functions.
On this submit, we clarify what the Logs IA class is, the way it can assist scale back prices in comparison with the usual log class, and methods to configure your AWS Glue sources to make use of this new log class. By routing logs to Logs IA, you’ll be able to obtain important financial savings in your CloudWatch Logs spend with out sacrificing entry to necessary debugging info once you want it.
CloudWatch log teams utilized by AWS Glue job steady logging
When steady logging is enabled, AWS Glue for Apache Spark writes Spark driver/executor logs and progress bar info into the next log group:
If a safety configuration is enabled for CloudWatch logs, AWS Glue for Apache Spark will create a log group named as follows for steady logs:
The default and {custom} log teams will probably be as follows:
- The default steady log group will probably be /
aws-glue/jobs/logs-v2-<Safety-Configuration-Title>
- The {custom} steady log group will probably be
<custom-log-group-name>-<Safety-Configuration-Title>
You possibly can present a {custom} log group title by way of the job parameter –continuous-log-logGroup.
Getting began with the brand new Rare Entry log class for AWS Glue workload
To realize the advantages from Logs IA to your AWS Glue workloads, it is advisable full the next two steps:
- Create a brand new log group utilizing the brand new Log IA class.
- Configure your AWS Glue job to level to the brand new log group
Full the next steps to create a brand new log group utilizing the brand new Rare Entry log class:
- On the CloudWatch console, select Log teams underneath Logs within the navigation pane.
- Select Create log group.
- For Log group title, enter
/aws-glue/jobs/logs-v2-infrequent-access.
- For Log class, select Rare Entry.
- Select Create.
Full the next steps to configure your AWS Glue job to level to the brand new log group:
- On the AWS Glue console, select ETL jobs within the navigation pane.
- Select your job.
- On the Job particulars tab, select Add new parameter underneath Job parameters.
- For Key, enter
--continuous-log-logGroup
. - For Worth, enter
/aws-glue/jobs/logs-v2-infrequent-access
. - Select Save.
- Select Run to set off the job.
New log occasions are written into the brand new log group.
View the logs with the Rare Entry log class
Now you’re able to view the logs with the Rare Entry log class. Open the log group /aws-glue/jobs/logs-v2-infrequent-access
on the CloudWatch console.
Whenever you select one of many log streams, you’ll discover that it redirects you to the CloudWatch console Logs Perception web page with a pre-configured default command and your log stream chosen by default. By selecting Run question, you’ll be able to view the precise log occasions on the Logs Insights web page.
Concerns
Bear in mind the next issues:
- You can’t change the log class of a log group after it’s created. You have to create a brand new log group to configure the Rare Entry class.
- The Logs IA class presents a subset of CloudWatch Logs capabilities, together with managed ingestion, storage, cross-account log analytics, and encryption with a decrease ingestion worth per GB. For instance, you’ll be able to’t view log occasions by way of the usual CloudWatch Logs console. To study extra in regards to the options provided throughout each log courses, consult with Log Courses.
Conclusion
This submit offered step-by-step directions to information you thru enabling Logs IA to your AWS Glue job logs. In case your AWS Glue ETL jobs generate giant volumes of log information that makes it a problem as you scale your purposes, the very best practices demonstrated on this submit can assist you cost-effectively scale whereas centralizing all of your logs in CloudWatch Logs. Begin utilizing the Rare Entry class together with your AWS Glue workloads in the present day and revel in the fee advantages.
In regards to the Authors
Noritaka Sekiyama is a Principal Huge Knowledge Architect on the AWS Glue crew. He works based mostly in Tokyo, Japan. He’s chargeable for constructing software program artifacts to assist prospects. In his spare time, he enjoys biking on his street bike.
Abeetha Bala is a Senior Product Supervisor for Amazon CloudWatch, primarily centered on logs. Being buyer obsessed, she solves observability challenges by way of revolutionary and cost-effective methods.
Kinshuk Pahare is a pacesetter in AWS Glue’s product administration crew. He drives efforts on the platform, developer expertise, and large information processing frameworks like Apache Spark, Ray, and Python Shell.