Thursday, October 12, 2023
HomeBig DataPush Amazon EMR step logs from Amazon EC2 situations to Amazon CloudWatch...

Push Amazon EMR step logs from Amazon EC2 situations to Amazon CloudWatch logs


Amazon EMR is an enormous knowledge service provided by AWS to run Apache Spark and different open-source purposes on AWS to construct scalable knowledge pipelines in a cheap method. Monitoring the logs generated from the roles deployed on EMR clusters is crucial to assist detect crucial points in actual time and establish root causes shortly.

Pushing these logs into Amazon CloudWatch lets you centralize and drive actionable intelligence out of your logs to deal with operational points without having to provision servers or handle software program. You possibly can immediately start writing queries with aggregations, filters, and common expressions. As well as, you’ll be able to visualize time collection knowledge, drill down into particular person log occasions, and export question outcomes to CloudWatch dashboards.

To ingest logs which are persevered on the Amazon Elastic Compute Cloud (Amazon EC2) situations of an EMR cluster into CloudWatch, you need to use the CloudWatch agent. This gives a easy solution to push logs from an EC2 occasion to CloudWatch.

The CloudWatch agent is a software program bundle that autonomously and constantly runs in your servers. You possibly can set up and configure the CloudWatch agent to gather system and software logs from EC2 situations, on-premises hosts, and containerized purposes. CloudWatch processes and shops the logs collected by the CloudWatch agent, which additional helps with the efficiency and well being monitoring of your infrastructure and purposes.

On this put up, we create an EMR cluster and centralize the EMR step logs of the roles in CloudWatch. This can make it simpler so that you can handle your EMR cluster, troubleshoot points, and monitor efficiency. This resolution is especially useful if you wish to use CloudWatch to gather and visualize real-time logs, metrics, and occasion knowledge, streamlining your infrastructure and software upkeep.

Overview of resolution

The answer offered on this put up relies on a selected configuration the place the EMR step concurrency degree is about to 1. Which means that just one step is run at a time on the cluster. It’s essential to notice that if the EMR step concurrency degree is about to a worth better than 1, the answer might not work as anticipated. We extremely advocate verifying your EMR step concurrency configuration earlier than implementing the answer offered on this put up.

The next diagram illustrates the answer structure.

Solution Architecture Diagram

The workflow contains the next steps:

  1. Customers begin an Apache Spark EMR job, making a step on the EMR cluster. Utilizing Apache Spark, the workload is distributed throughout the completely different nodes of the EMR cluster.
  2. In every node (EC2 occasion) of the cluster, a CloudWatch agent watches completely different logs directories, capturing new entries within the log information and pushing them to CloudWatch.
  3. Customers can view the step logs accessing the completely different log teams from the CloudWatch console. The step logs written by Amazon EMR are as follows:
    • controller — Details about the processing of the step. In case your step fails whereas loading, you will discover the stack hint on this log.
    • stderr — The usual error channel of Spark whereas it processes the step.
    • stdout — The usual output channel of Spark whereas it processes the step.

We offer an AWS CloudFormation template on this put up as a common information. The template demonstrates find out how to configure a CloudWatch agent on Amazon EMR to push Spark logs to CloudWatch. You possibly can evaluate and customise it as wanted to incorporate your Amazon EMR safety configurations. As a greatest apply, we advocate together with your Amazon EMR safety configurations within the template to encrypt knowledge in transit.

You must also bear in mind that among the assets deployed by this stack incur prices after they stay in use.

Within the subsequent sections, we undergo the next steps:

  1. Create and add the bootstrap script to an Amazon Easy Storage Service (Amazon S3) bucket.
  2. Use the CloudFormation template to create the next assets:
  3. Monitor the Spark logs on the CloudWatch console.

Stipulations

This put up assumes that you’ve got the next:

Create and add the bootstrap script to an S3 bucket

For extra data, see Importing objects and Putting in and operating the CloudWatch agent in your servers.

To create and the add the bootstrap script, full the next steps:

  1. Create an area file named bootstrap_cloudwatch_agent.sh with the next content material:
    #!/bin/bash
    
    echo -e 'Putting in CloudWatch Agent... n'
    sudo rpm -Uvh --force https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/newest/amazon-cloudwatch-agent.rpm
    
    echo -e 'Beginning CloudWatch Agent... n'
    sudo amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:AmazonCloudWatch-Config.json -s

  2. On the Amazon S3 console, select your S3 bucket.
  3. On the Objects tab, select Add.
  4. Select Add information, then select the bootstrap script.
  5. Select Add, then select the file identify: bootstrap_cloudwatch_agent.sh.
  6. Select Copy S3 URI. We use this worth in a later step.

Provision assets with the CloudFormation template

Select Launch Stack to launch a CloudFormation stack in your account and deploy the template:

This template creates an IAM position, IAM occasion profile, Methods Supervisor parameter, and EMR cluster. The cluster begins the Spark PI estimation instance software. You’ll be billed for the AWS assets used should you create a stack from this template.

The CloudFormation wizard will ask you to switch or present these parameters:

  • InstanceType – The sort of occasion for all occasion teams. The default is m4.xlarge.
  • InstanceCountCore – The variety of situations within the core occasion group. The default is 2.
  • EMRReleaseLabel – The Amazon EMR launch label you need to use. The default is emr-6.9.0.
  • BootstrapScriptPath – The S3 path of your CloudWatch agent set up bootstrap script that you just copied earlier.
  • Subnet – The EC2 subnet the place the cluster launches. You should present this parameter.
  • EC2KeyPairName – An elective EC2 keypair for connecting to cluster nodes, as a substitute for Session Supervisor.

Monitor the log streams

After the CloudFormation stack deploys efficiently, on the CloudWatch console, select Log teams within the navigation pane. Then filter the log teams by the prefix /aws/emr/grasp.

choose Log groups in the navigation pane

The ID within the log group corresponds to the EC2 occasion ID of the EMR major node. When you have a number of EMR clusters, you need to use this ID to establish a selected EMR cluster, based mostly on the first node ID.

Within the log group, you will see the three completely different log streams.

In the log group, you will find the three different log streams.

The log streams comprise the next data:

  • step-stdout – The usual output channel of Spark whereas it processes the step.
    The standard output channel of Spark while it processes the step
  • step-stderr – The usual error channel of Spark whereas it processes the step.
    The standard error channel of Spark while it processes the step.
  • step-controller – Details about the processing of the step. In case your step fails whereas loading, you will discover the stack hint on this log.
    Information about the processing of the step.

Clear up

To keep away from future expenses in your account, delete the assets you created on this walkthrough. The EMR cluster will incur expenses so long as the cluster is energetic, so cease it while you’re completed.

  1. On the CloudFormation console, within the navigation pane, select Stacks.
  2. Select the stack you launched (EMR-CloudWatch-Demo), then select Delete.
  3. Empty the S3 bucket you created.
  4. Delete the S3 bucket you created.

Conclusion

Now that you’ve got accomplished the steps on this walkthrough, you might have the CloudWatch agent operating in your cluster hosts and configured to push EMR step logs to CloudWatch. With this characteristic, you’ll be able to successfully monitor the well being and efficiency of your Spark jobs operating on Amazon EMR, detecting crucial points in actual time and figuring out root causes shortly.

You possibly can bundle and deploy this resolution via a CloudFormation template like this instance template, which creates the IAM occasion profile position, Methods Supervisor parameter, and EMR cluster.

To take this additional, think about using these logs in CloudWatch alarms for alerts on a log group-metric filter. You possibly can acquire them with different alarms right into a composite alarm or configure alarm actions equivalent to sending Amazon Easy Notification Service (Amazon SNS) notifications to set off event-driven processes equivalent to AWS Lambda capabilities.


Concerning the Creator

Ennio Pastore is a Senior Knowledge Architect on the AWS Knowledge Lab staff. He’s an fanatic of every thing associated to new applied sciences which have a constructive influence on companies and common livelihood. Ennio has over 10 years of expertise in knowledge analytics. He helps corporations outline and implement knowledge platforms throughout industries, equivalent to telecommunications, banking, gaming, retail, and insurance coverage.



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments