Sunday, December 3, 2023
HomeBig DataPut together and cargo Amazon S3 information into Teradata utilizing AWS Glue...

Put together and cargo Amazon S3 information into Teradata utilizing AWS Glue by its native connector for Teradata Vantage


On this submit, we discover easy methods to use the AWS Glue native connector for Teradata Vantage to streamline information integrations and unlock the complete potential of your information.

Companies usually depend on Amazon Easy Storage Service (Amazon S3) for storing giant quantities of information from numerous information sources in an economical and safe method. For these utilizing Teradata for information evaluation, integrations by the AWS Glue native connector for Teradata Vantage unlock new prospects. AWS Glue enhances the pliability and effectivity of information administration, permitting firms to seamlessly combine their information, no matter its location, with Teradata’s analytical capabilities. This new connector eliminates technical hurdles associated to configuration, safety, and administration, enabling firms to effortlessly export or import their datasets into Teradata Vantage. In consequence, companies can focus extra on extracting significant insights from their information, fairly than coping with the intricacies of information integration.

AWS Glue is a serverless information integration service that makes it simple for analytics customers to find, put together, transfer, and combine information from a number of sources for analytics, machine studying (ML), and utility improvement. With AWS Glue, you’ll be able to uncover and hook up with greater than 100 numerous information sources and handle your information in a centralized information catalog. You’ll be able to visually create, run, and monitor extract, rework, and cargo (ETL) pipelines to load information into your information lakes.

Teradata Company is a number one related multi-cloud information platform for enterprise analytics, centered on serving to firms use all their information throughout an enterprise, at scale. As an AWS Knowledge & Analytics Competency accomplice, Teradata affords an entire cloud analytics and information platform, together with for Machine Studying.

Introducing the AWS Glue native connector for Teradata Vantage

AWS Glue offers assist for Teradata, accessible by each AWS Glue Studio and AWS Glue ETL scripts. With AWS Glue Studio, you profit from a visible interface that simplifies the method of connecting to Teradata and authoring, working, and monitoring AWS Glue ETL jobs. For information builders, this assist extends to AWS Glue ETL scripts, the place you should use Python or Scala to create and handle extra particular information integration and transformation duties.

The AWS Glue native connector for Teradata Vantage means that you can effectively learn and write information from Teradata with out the necessity to set up or handle any connector libraries. You’ll be able to add Teradata as each the supply and goal inside AWS Glue Studio’s no-code, drag-and-drop visible interface or use the connector straight in an AWS Glue ETL script job.

Answer overview

On this instance, you utilize AWS Glue Studio to complement and add information saved on Amazon S3 to Teradata Vantage. You begin by becoming a member of the Occasion and Venue recordsdata from the TICKIT dataset. Subsequent, you filter the outcomes to a single geographic area. Lastly, you add the refined information to Teradata Vantage.

The TICKIT dataset tracks gross sales exercise for the fictional TICKIT web site, the place customers purchase and promote tickets on-line for sporting occasions, exhibits, and concert events. On this dataset, analysts can determine ticket motion over time, success charges for sellers, and best-selling occasions, venues, and seasons.

For this instance, you utilize AWS Glue Studio to develop a visible ETL pipeline. This pipeline will learn information from Amazon S3, carry out transformations, after which load the remodeled information into Teradata. The next diagram illustrates this structure.

Solution Overview

By the tip of this submit, your visible ETL job will resemble the next screenshot.

Visual ETL Job Flow

Conditions

For this instance, it is best to have entry to an present Teradata database endpoint with community reachability from AWS and permissions to create tables and cargo and question information.

AWS Glue wants community entry to Teradata to learn or write information. How that is configured depends upon the place your Teradata is deployed and the precise community configuration. For Teradata deployed on AWS, you would possibly have to configure VPC peering or AWS PrivateLink, safety teams, and community entry management lists (NACLs) to permit AWS Glue to speak with Teradata overt TCP. If Teradata is outdoors AWS, networking providers reminiscent of AWS Website-to-Website VPN or AWS Direct Join could also be required. Public web entry shouldn’t be really useful on account of safety dangers. Should you select public entry, it’s safer to run the AWS Glue job in a VPC behind a NAT gateway. This strategy lets you enable checklist just one IP tackle for incoming site visitors in your community firewall. For extra info, check with Infrastructure safety in AWS Glue.

Arrange Amazon S3

Each object in Amazon S3 is saved in a bucket. Earlier than you’ll be able to retailer information in Amazon S3, you should create an S3 bucket to retailer the outcomes. Full the next steps:

  1. On the Amazon S3 console, select Buckets within the navigation pane.
  2. Select Create bucket.
  3. For Identify, enter a globally distinctive title to your bucket; for instance, tickit8530923.
  4. Select Create bucket.
  5. Obtain the TICKIT dataset and unzip it.
  6. Create the folder tickit in your S3 bucket and add the allevents_pipe.txt and venue_pipe.txt recordsdata.

Configure Teradata connections

To connect with Teradata from AWS Glue, see Configuring Teradata Connection.

You will need to create and retailer your Teradata credentials in an AWS Secrets and techniques Supervisor secret after which affiliate that secret with a Teradata AWS Glue connection. We talk about these two steps in additional element later on this submit.

Create an IAM function for the AWS Glue ETL job

If you create the AWS Glue ETL job, you specify an AWS Id and Entry Administration (IAM) function for the job to make use of. The function should grant entry to all assets utilized by the job, together with Amazon S3 (for any sources, targets, scripts, driver recordsdata, and momentary directories) and Secrets and techniques Supervisor. For directions, see Configure an IAM function to your ETL job.

Create desk in Teradata

Utilizing your most well-liked database instrument, log in to Teradata. Run the next code to create the desk in Teradata the place you’ll load your information:

CREATE MULTISET TABLE check.tickit, FALLBACK
   (venueid varchar(25),
    venuename varchar(100),
    venuecity varchar(100),
    venuestate varchar(25),
    venueseats varchar(25),
    eventid varchar(25),
    catid varchar(25),
    dateid varchar(25),
    eventname varchar(100),
    starttime varchar(100))
    NO PRIMARY INDEX
;

Retailer Teradata login credentials

An AWS Glue connection is a Knowledge Catalog object that shops login credentials, URI strings, and extra. The Teradata connector requires Secrets and techniques Supervisor for storing the Teradata person title and password that you just use to connect with Teradata.

To retailer the Teradata person title and password in Secrets and techniques Supervisor, full the next steps:

  1. On the Secrets and techniques Supervisor console, select Secrets and techniques within the navigation pane.
  2. Select Retailer a brand new secret.
  3. Choose Different sort of secret.
  4. Enter the important thing/worth USER and teradata_user, then select Add row.
  5. Enter the important thing/worth PASSWORD and teradata_user_password, then select Subsequent.

Teradata Secrets Manager Configuration

  1. For Secret title, enter a descriptive title, then select Subsequent.
  2. Select Subsequent to maneuver to the evaluation step, then select Retailer.

Create the Teradata connection in AWS Glue

Now you’re able to create an AWS Glue connection to Teradata. Full the next steps:

  1. On the AWS Glue console, select Connections beneath Knowledge Catalog within the navigation pane.
  2. Select Create connection.
  3. For Identify, enter a reputation (for instance, teradata_connection).
  4. For Connection sort¸ select Teradata.
  5. For Teradata URL, enter jdbc:teradata://url_of_teradata/database=name_of_your_database.
  6. For AWS Secret, select the key together with your Teradata credentials that you just created earlier.

Teradata Connection access

Create an AWS Glue visible ETL job to rework and cargo information to Teradata

Full the next steps to create your AWS Glue ETL job:

  1. On the AWS Glue console, beneath ETL Jobs within the navigation pane, select Visible ETL.
  2. Select Visible ETL.
  3. Select the pencil icon to enter a reputation to your job.

We add venue_pipe.txt as our first dataset.

  1. Select Add nodes and select Amazon S3 on the Sources tab.

Amazon S3 source node

  1. Enter the next information supply properties:
    1. For Identify, enter Venue.
    2. For S3 supply sort, choose S3 location.
    3. For S3 URL, enter the S3 path to venue_pipe.txt.
    4. For Knowledge format, select CSV.
    5. For Delimiter, select Pipe.
    6. Deselect First line of supply file accommodates column headers.

S3 data source properties

Now we add allevents_pipe.txt as our second dataset.

  1. Select Add nodes and select Amazon S3 on the Sources tab.
  2. Enter the next information supply properties:
    1. For Identify, enter Occasion.
    2. For S3 supply sort, choose S3 location.
    3. For S3 URL, enter the S3 path to allevents_pipe.txt.
    4. For Knowledge format, select CSV.
    5. For Delimiter, select Pipe.
    6. Deselect First line of supply file accommodates column headers.

Subsequent, we rename the columns of the Venue dataset.

  1. Select Add nodes and select Change Schema on the Transforms tab.
  2. Enter the next rework properties:
    1. For Identify, enter Rename Venue information.
    2. For Node mother and father, select Venue.
    3. Within the Change Schema part, map the supply keys to the goal keys:
      1. col0: venueid
      2. col1: venuename
      3. col2: venuecity
      4. col3: venuestate
      5. col4: venueseats

Rename Venue data ETL Transform

Now we filter the Venue dataset to a selected geographic area.

  1. Select Add nodes and select Filter on the Transforms tab.
  2. Enter the next rework properties:
    1. For Identify, enter Location Filter.
    2. For Node mother and father, select Venue.
    3. For Filter situation, select venuestate for Key, select matches for Operation, and enter DC for Worth.

Location Filter Settings

Now we rename the columns within the Occasion dataset.

  1. Select Add nodes and select Change Schema on the Transforms tab.
  2. Enter the next rework properties:
    1. For Identify, enter Rename Occasion information.
    2. For Node mother and father, select Occasion.
    3. Within the Change Schema part, map the supply keys to the goal keys:
      1. col0: eventid
      2. col1: e_venueid
      3. col2: catid
      4. col3: dateid
      5. col4: eventname
      6. col5: starttime

Subsequent, we be a part of the Venue and Occasion datasets.

  1. Select Add nodes and select Be part of on the Transforms tab.
  2. Enter the next rework properties:
    1. For Identify, enter Be part of.
    2. For Node mother and father, select Location Filter and Rename Occasion information.
    3. For Be part of sort¸ select Interior be a part of.
    4. For Be part of situations, select venueid for Location Filter and e_venueid for Rename Occasion information.

Join Properties

Now we drop the duplicate column.

  1. Select Add nodes and select Change Schema on the Transforms tab.
  2. Enter the next rework properties:
    1. For Identify, enter Drop column.
    2. For Node mother and father, select Be part of.
    3. Within the Change Schema part, choose Drop for e_venueid .

Drop column properties

Subsequent, we load the information into the Teradata desk.

  1. Select Add nodes and select Teradata on the Targets tab.
  2. Enter the next information sink properties:
    1. For Identify, enter Teradata.
    2. For Node mother and father, select Drop column.
    3. For Teradata connection, select teradata_connection.
    4. For Desk title, enter schema.tablename of the desk you created in Teradata.

Data sink properties Teradata

Lastly, we run the job and cargo the information into Teradata.

  1. Select Save, then select Run.

A banner will show that the job has began.

  1. Select Runs, which shows the standing of the job.

The run standing will change to Succeeded when the job is full.

Run Status

  1. Hook up with your Teradata after which question the desk the information was loaded to it.

The filtered and joined information from the 2 datasets might be within the desk.

Filtered and joined data result

Clear up

To keep away from incurring further costs brought on by assets created as a part of this submit, ensure you delete the gadgets you created within the AWS account for this submit:

  • The Secrets and techniques Supervisor key created for the Teradata credentials
  • The AWS Glue native connector for Teradata Vantage
  • The info loaded within the S3 bucket
  • The AWS Glue Visible ETL job

Conclusion

On this submit, you created a connection to Teradata utilizing AWS Glue after which created an AWS Glue job to rework and cargo information into Teradata. The AWS Glue native connector for Teradata Vantage empowers your information analytics journey by offering a seamless and environment friendly pathway for integrating your information with Teradata. This new functionality in AWS Glue not solely simplifies your information integration workflows but additionally opens up new avenues for superior analytics, enterprise intelligence, and machine studying improvements.

With the AWS Teradata Connector, you’ve got the very best instrument at your disposal for simplifying information integration duties. Whether or not you’re trying to load Amazon S3 information into Teradata for analytics, reporting, or enterprise insights, this new connector streamlines the method, making it extra accessible and cost-effective.

To get began with AWS Glue, check with Getting Began with AWS Glue.


Concerning the Authors

Kamen Sharlandjiev is a Sr. Huge Knowledge and ETL Options Architect and AWS Glue knowledgeable. He’s on a mission to make life simpler for purchasers who’re dealing with complicated information integration challenges. His secret weapon? Absolutely managed, low-code AWS providers that may get the job completed with minimal effort and no coding. Comply with Kamen on LinkedIn to maintain updated with the newest AWS Glue information!

Sean Bjurstrom is a Technical Account Supervisor in ISV accounts at Amazon Internet Providers, the place he focuses on analytics applied sciences and attracts on his background in consulting to assist clients on their analytics and cloud journeys. Sean is captivated with serving to companies harness the facility of information to drive innovation and progress. Exterior of labor, he enjoys working and has participated in a number of marathons.

Vinod Jayendra is an Enterprise Assist Lead in ISV accounts at Amazon Internet Providers, the place he helps clients resolve their architectural, operational, and cost-optimization challenges. With a specific concentrate on serverless applied sciences, he attracts from his intensive background in utility improvement to assist clients construct top-tier options. Past work, he finds pleasure in high quality household time, embarking on biking adventures, and training youth sports activities groups.

Doug Mbaya is a Senior Companion Answer architect with a spotlight in analytics and machine studying. Doug works intently with AWS companions and helps them combine their options with AWS analytics and machine studying options within the cloud.



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments