Saturday, October 21, 2023
HomeBig DataSpeed up your information warehouse migration to Amazon Redshift – Half 7

Speed up your information warehouse migration to Amazon Redshift – Half 7


Tens of 1000’s of consumers use Amazon Redshift to achieve enterprise insights from their information. With Amazon Redshift, you need to use customary SQL to question information throughout your information warehouse, operational information shops, and information lake. You can even combine different AWS providers reminiscent of Amazon EMR, Amazon Athena, Amazon SageMaker, AWS Glue, AWS Lake Formation, and Amazon Kinesis to make use of all of the analytic capabilities in AWS.

Migrating an information warehouse could be advanced. It’s important to migrate terabytes or petabytes of information out of your legacy system whereas not disrupting your manufacturing workload. You additionally want to make sure that the brand new goal information warehouse is in keeping with upstream information modifications in order that enterprise reporting can proceed uninterrupted whenever you reduce over to the brand new platform.

Beforehand, there have been two major methods to take care of information consistency after the preliminary bulk load throughout a migration to Amazon Redshift. You may establish the modified rows, maybe utilizing a filter on replace timestamps, or you possibly can modify your extract, rework, and cargo (ETL) course of to jot down to each the supply and goal databases. Each of those choices require guide effort to implement and enhance the price and danger of the migration mission.

AWS Schema Conversion Software (AWS SCT) may allow you to with preliminary bulk load from Azure Synapse Analytics, BigQuery, Greenplum Database, IBM Netezza, Microsoft SQL Server, Oracle, Snowflake, Teradata and Vertica. Now, we’re pleased to share that AWS SCT has automated sustaining information consistency for you. Should you’re migrating from an IBM Netezza information warehouse to Amazon Redshift, the AWS SCT information extractors will routinely seize modifications from the supply and apply them on the goal. You configure a change information seize (CDC) migration activity in AWS SCT, and it’ll extract the related information modifications from IBM Netezza and apply them in a transactionally constant order on Amazon Redshift. It is advisable configure the wanted assets on IBM Netezza and begin the info migration—the supply database stays absolutely operational in the course of the migration and replication.

On this publish, we describe at a high-level how CDC duties work in AWS SCT. Then we deep dive into an instance of find out how to configure, begin, and handle a CDC migration activity. We glance briefly at efficiency and how one can tune a CDC migration, after which conclude with some details about how one can get began by yourself migration.

Answer overview

The next diagram exhibits the info migration and replication workflow with AWS SCT.

In step one, your AWS SCT information extraction agent completes the total load of your supply information to Amazon Redshift. Then the AWS SCT information extraction agent makes use of a historical past database in Netezza. The historical past database captures details about person exercise reminiscent of queries, question plans, desk entry, column entry, session creation, and failed authentication requests. The information extraction agent extracts details about transactions that you just run in your supply Netezza database and replicates them to your goal Redshift database.

You can begin ongoing replication routinely after you full the total load. Alternatively, you can begin CDC at a later time or on a schedule. For extra data, consult with Configuring ongoing information replication.

At a excessive degree, the continuing replication circulation is as follows.

At first of the replication, the info extraction agent captures the final transaction identifier within the historical past desk. The information extraction agent shops this worth within the max_createxid variable. To seize the transaction ID, the agent runs the next question:

SELECT max(XID) AS max_createxid 
    FROM <DB_NAME>.<SCHEMA_NAME>."$hist_plan_prolog_n";

If this transaction ID worth is completely different from the CDC begin level, then the agent identifies the delta to copy. This delta consists of all transactions for the chosen tables that occurred after full load or after the earlier replication. The information extraction agent selects the up to date information out of your supply desk.

From this up to date information, AWS SCT creates two momentary tables. The primary desk consists of all rows that you just deleted out of your supply database and the outdated information of the rows that you just up to date. The second desk consists of all rows that you just inserted and the brand new information of the rows that you just up to date. AWS SCT then makes use of these tables in JOIN clauses to copy the modifications to your goal Redshift database.

Subsequent, AWS SCT copies these tables to your Amazon Easy Storage Service (Amazon S3) bucket and makes use of this information to replace your goal Redshift cluster.

After updating your goal database, AWS SCT deletes these momentary tables. Subsequent, your information extraction agent units the worth of the CDC begin level equal to the captured transaction ID (max_createxid). Through the subsequent information replication run, your agent will decide the delta to copy utilizing this up to date CDC begin level.

All modifications that occur to your supply database in the course of the replication run shall be captured within the subsequent replication run. Just remember to repeat the replication steps till the delta is the same as zero for every desk that you just included within the migration scope. At this level, you possibly can reduce over to your new databases.

Configure your supply database

In your supply Netezza database, create a historical past database and configure the historical past logging. Subsequent, grant learn permissions for all tables within the historical past database to the person that you just use within the AWS SCT mission. This person has the minimal permissions which can be required to transform your supply database schemas to Amazon Redshift.

Configure AWS SCT

Earlier than you begin information migration and replication with AWS SCT, just remember to obtain the Netezza and Amazon Redshift drivers. Be aware of the trail to the folder the place you saved these recordsdata. You’ll specify this path within the AWS SCT and information extraction agent settings.

To verify the info extraction brokers work correctly, set up the newest model of Amazon Corretto 11.

To configure AWS SCT, full the next steps:

  1. After you create an AWS SCT mission, hook up with your supply and goal databases and arrange the mapping guidelines. A mapping rule describes a source-target pair that defines the migration goal in your supply database schema.
  2. Convert your database schemas and apply them to Amazon Redshift when you haven’t achieved this but. Guarantee that the goal tables exist in your Redshift database earlier than you begin information migration.
  3. Now, set up the info extraction agent. The AWS SCT installer consists of the set up recordsdata for information extraction brokers within the brokers folder. Configure your information extraction brokers by including the listening port quantity and the trail to the supply and goal database drivers. For the listening port, you possibly can proceed with the default worth. For database drivers, enter the trail that you just famous earlier than.

The next diagram exhibits how the AWS SCT information extraction brokers work.

After you put in the info extraction agent, register it in AWS SCT.

  1. Open the info migration view in AWS SCT and select Register.
  2. Enter the identify of your agent, the host identify, and the port that you just configured within the earlier step. For the host identify, you need to use the localhost 0.0.0.0 when you run the agent on the identical machine the place you put in the info extraction agent.

Create and run a CDC activity

Now you possibly can create and handle your information migration and replication duties. To take action, full the next steps:

  1. Choose the tables in your supply database emigrate, open the context (right-click) menu, and select Create native activity.
  2. Select your information migration mode (for this publish, select Extract, add and replica to copy information modifications out of your supply database):
    1. Extract solely – Extract your information and put it aside to your native working folders.
    2. Extract and add – Extract your information and add it to Amazon S3.
    3. Extract, add and replica – Extract your information, add it to Amazon S3, and replica it into your Redshift information warehouse.
  3. Select your encryption sort. Just remember to configure encryption for protected and safe information migrations.
  4. Choose Allow CDC.

  1. After this, you possibly can swap to the CDC settings tab.
  2. For CDC mode, you possibly can select from the next choices:
    1. Migrate current information and replicate ongoing modifications – Migrate all current supply information after which begin the replication. That is the default possibility.
    2. Replicate information modifications solely – Begin information replication instantly.

Typically you don’t must migrate all current supply information. For instance, when you’ve got already migrated your information, you can begin the info replication by selecting Replicate information modifications solely.

  1. Should you select Replicate information modifications solely, it’s also possible to set the Final CDC level to configure the replication to begin from this level. If it’s not set, AWS SCT information extraction brokers replicate all modifications that happen after your replication activity is began.

In case your replication activity failed, you possibly can restart the replication from the purpose of failure. You could find the identifier of the final migrated CDC level on the CDC processing particulars tab in AWS SCT, set Final CDC level and begin the duty once more. This can permit AWS SCT information extraction brokers to copy all modifications in your supply tables to your goal database with out gaps.

  1. You can even configure whenever you need to schedule the CDC runs to start.

If you choose Instantly, the primary replication run instantly after your agent completes the total load. Alternatively, you possibly can specify the time and date whenever you need to begin the replication.

  1. Additionally, you possibly can schedule when to run the replication once more. You’ll be able to enter the variety of days, hours, or minutes when to repeat the replication runs. Set these values relying on the depth of information modifications in your supply database.
  2. Lastly, you possibly can set the tip date when AWS SCT will cease working the replication.

  1. On the Amazon S3 settings tab, you possibly can join your AWS SCT information extraction agent together with your Amazon S3 bucket.

You don’t want to do that step when you’ve got configured the AWS service profile within the world software settings.

  1. After you may have configured all settings, select Create to create a CDC activity.

  1. Begin this activity within the AWS SCT person interface.

The next screenshots present examples of the AWS SCT person interface when you began duties.

You’ll be able to run a number of CDC duties in parallel on the identical time. For instance, you possibly can embody completely different units of supply tables in every activity to copy the modifications to completely different goal Redshift clusters. AWS SCT handles these replication duties and distributes assets accurately to reduce the replication time.

Knowledge replication limitations

There are just a few limitations in AWS SCT information replication:

  • Modifications in your supply database don’t set off the replication run as a result of AWS SCT isn’t in a position to automate these runs (as of this writing). You’ll be able to as a substitute run the info replication duties on a predefined schedule.
  • AWS SCT doesn’t replicate TRUNCATE and DDL statements. Should you change the construction of your supply desk or truncate it, then it’s essential to run the identical statements in your goal database. It is best to make these modifications manually as a result of AWS SCT isn’t conscious of construction updates.

Finish-to-end instance

Now that you know the way to create an area replication activity in AWS SCT, we deep dive and present how AWS SCT performs the extract and cargo processes.

  1. First, we run the next code to verify that we accurately configured our supply Netezza database. To make use of this code instance, change the identify of your historical past database.
SELECT COUNT(*) FROM HISTDB.DBE."$hist_column_access_1";

Should you configured your database accurately, then the output of this command features a worth that’s completely different from zero. In our case, the result’s as follows:

 COUNT |
-------+
2106717|

  1. Now we create a desk on Netezza to make use of within the instance. The desk has three columns and a major key.
DROP TABLE NZ_DATA4EXTRACTOR.CDC_DEMO IF EXISTS; 
CREATE TABLE NZ_DATA4EXTRACTOR.CDC_DEMO 
(
    ID INTEGER NOT NULL, 
    TS TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
    CMNT CHARACTER(16)
)
DISTRIBUTE ON RANDOM; 

ALTER TABLE NZ_DATA4EXTRACTOR.CDC_DEMO 
    ADD CONSTRAINT CDC_DEMO_PK PRIMARY KEY (ID); 

SELECT * 
    FROM NZ_DATA4EXTRACTOR.CDC_DEMO 
    ORDER BY ID;

  1. The SELECT assertion returns an empty desk:
  1. Earlier than we begin the replication, we run the next question on Netezza to get the newest transaction identifier within the historical past desk:
SELECT MAX(XID) AS XID 
    FROM HISTDB.DBE."$hist_plan_prolog_1";

For our take a look at desk, the script prints the final transaction identifier, which is 2691798:

  1. To ensure that our desk doesn’t embody new transactions, AWS SCT runs the next script. If you wish to run this script manually, change 2691798 with the final transaction identifier in your historical past desk.
SELECT MAX(CREATEXID) AS CREATEXID
    FROM NZ_DATA4EXTRACTOR.CDC_DEMO 
    WHERE CREATEXID > 2691798;

As anticipated, the script doesn’t return any values.

CREATEXID and DELETEXID are hidden system columns that exist in each Netezza desk. CREATEXID identifies the transaction ID that created the row, and DELETEXID identifies the transaction ID that deleted the row. AWS SCT makes use of them to seek out modifications within the supply information.

Now we’re prepared to begin the replication.

  1. We assume you’ve used AWS SCT to transform the instance desk and construct it on the goal Amazon Redshift. AWS SCT runs the next assertion on Amazon Redshift:
DROP TABLE IF EXISTS nz_data4extractor_nz_data4extractor.cdc_demo;

CREATE TABLE nz_data4extractor_nz_data4extractor.cdc_demo
(
    id INTEGER ENCODE AZ64 NOT NULL,
    ts TIMESTAMP WITHOUT TIME ZONE ENCODE AZ64 DEFAULT SYSDATE::TIMESTAMP,
    cmnt CHARACTER VARYING(48) ENCODE LZO
)
DISTSTYLE AUTO;

ALTER TABLE nz_data4extractor_nz_data4extractor.cdc_demo
    ADD CONSTRAINT cdc_demo_pk PRIMARY KEY (id);

  1. AWS SCT additionally creates a staging desk that holds replication modifications till they are often utilized on the precise goal desk:
CREATE TABLE IF NOT EXISTS "nz_data4extractor_nz_data4extractor"."_cdc_unit"
    (LIKE "nz_data4extractor_nz_data4extractor"."cdc_demo" INCLUDING  DEFAULTS);
ALTER TABLE "nz_data4extractor_nz_data4extractor"."_cdc_unit" 
    ADD COLUMN deletexid_ BIGINT;

  1. AWS SCT runs the next question to seize all modifications that occurred after the final transaction identifier:
SET show_deleted_records = true;

SELECT 
    ID, 
    TS, 
    CMNT
FROM NZ_DATA4EXTRACTOR.CDC_DEMO
WHERE CREATEXID <= 2691798
    AND (DELETEXID = 0 OR DELETEXID > 2691798)

This script returns an empty desk:

  1. Now, we alter information on Netezza and see the way it will get replicated to Amazon Redshift:
INSERT INTO NZ_DATA4EXTRACTOR.CDC_DEMO (ID, CMNT) VALUES (1, 'One');
INSERT INTO NZ_DATA4EXTRACTOR.CDC_DEMO (ID, CMNT) VALUES (2, 'Two');
INSERT INTO NZ_DATA4EXTRACTOR.CDC_DEMO (ID, CMNT) VALUES (3, 'Three');
INSERT INTO NZ_DATA4EXTRACTOR.CDC_DEMO (ID, CMNT) VALUES (4, '4');
INSERT INTO NZ_DATA4EXTRACTOR.CDC_DEMO (ID, CMNT) VALUES (5, '5');

SELECT * FROM NZ_DATA4EXTRACTOR.CDC_DEMO
    ORDER BY ID;

The previous script returns the next consequence:

ID|TS                     |CMNT            |
--+-----------------------+----------------+
 1|2023-03-07 14:05:11.000|One             |
 2|2023-03-07 14:05:11.000|Two             |
 3|2023-03-07 14:05:11.000|Three           |
 4|2023-03-07 14:05:11.000|4            |
 5|2023-03-07 14:05:11.000|5            |

  1. AWS SCT checks for information modifications ranging from the final transaction ID utilizing the next question:
SET show_deleted_records = true;

SELECT MAX(CREATEXID) AS CREATEXID
    FROM NZ_DATA4EXTRACTOR.CDC_DEMO
    WHERE createxid > 2691798

The script returns a consequence that’s completely different from zero:

CREATEXID|
---------+
  2691824|

As a result of the brand new transaction ID is larger than the final transaction ID, the historical past desk incorporates new information to be replicated.

  1. AWS SCT runs the next question to extract the modifications. The appliance detects all rows that had been inserted, deleted, or up to date inside the scope of transactions with IDs in vary from 2691798 + 1 to 2691824.
SELECT
    ID,
    TS,
    CMNT,
    deletexid_
FROM (
    SELECT
        createxid,
        rowid,
        deletexid,
        2691798 AS min_CDC_trx,
        2691824 AS max_CDC_trx,
        CASE WHEN deletexid > max_CDC_trx
            THEN 0
            ELSE deletexid
            END AS deletexid_,
        MIN(createxid) OVER (PARTITION BY rowid) AS min_trx,
        COUNT(1) OVER (PARTITION BY rowid) AS rowid_cnt,
        ID,
        TS,
        CMNT
    FROM NZ_DATA4EXTRACTOR.CDC_DEMO AS t
    WHERE deletexid <> 1
        AND (CREATEXID > min_CDC_trx OR deletexid_ > min_CDC_trx) -- Prior run max trx
        AND CREATEXID <= max_CDC_trx -- Present run max trx
    ) AS r
WHERE (min_trx = createxid OR deletexid_ = 0)
    AND NOT (
        CREATEXID > min_CDC_trx 
        AND deletexid <= max_CDC_trx 
        AND rowid_cnt = 1 
        AND deletexid > 0
        )

The extracted information is as follows:

ID|TS                     |CMNT            |DELETEXID_|
--+-----------------------+----------------+----------+
 1|2023-03-07 14:05:11.000|One             |         0|
 2|2023-03-07 14:05:11.000|Two             |         0|
 3|2023-03-07 14:05:11.000|Three           |         0|
 4|2023-03-07 14:05:11.000|4            |         0|
 5|2023-03-07 14:05:11.000|5            |         0|

  1. Subsequent, AWS SCT compresses the info and uploads it to Amazon S3. Then AWS SCT runs the next command to repeat the info into the staging desk on Amazon Redshift:
TRUNCATE TABLE "nz_data4extractor_nz_data4extractor"."_cdc_unit"; 

COPY "nz_data4extractor_nz_data4extractor"."_cdc_unit" 
    ("id", "ts", "cmnt", "deletexid_")
    FROM 's3://bucket/folder/unit_1.manifest' MANIFEST
    CREDENTIALS '...'
    REGION '...'
    REMOVEQUOTES
    IGNOREHEADER 1
    GZIP
    DELIMITER '|';

  1. From the staging desk, AWS SCT applies the modifications to the precise goal desk. For this iteration, we insert new rows into the Redshift desk:
INSERT INTO "nz_data4extractor_nz_data4extractor"."cdc_demo"("id", "ts", "cmnt")
SELECT 
        "id", 
        "ts", 
        "cmnt" 
    FROM "nz_data4extractor_nz_data4extractor"."_cdc_unit" t2
    WHERE t2.deletexid_ = 0;

  1. Let’s run one other script that not solely inserts, but in addition deletes and updates information within the supply desk:
INSERT INTO NZ_DATA4EXTRACTOR.CDC_DEMO (ID, CMNT) VALUES (6, 'Six');
INSERT INTO NZ_DATA4EXTRACTOR.CDC_DEMO (ID, CMNT) VALUES (7, 'Seven');
INSERT INTO NZ_DATA4EXTRACTOR.CDC_DEMO (ID, CMNT) VALUES (8, 'Eight');

DELETE FROM NZ_DATA4EXTRACTOR.CDC_DEMO WHERE ID = 1;
DELETE FROM NZ_DATA4EXTRACTOR.CDC_DEMO WHERE ID = 7;

UPDATE NZ_DATA4EXTRACTOR.CDC_DEMO SET CMNT = 'Up to date Two' WHERE ID = 2;
UPDATE NZ_DATA4EXTRACTOR.CDC_DEMO SET CMNT = 'Up to date 4' WHERE ID = 4;
UPDATE NZ_DATA4EXTRACTOR.CDC_DEMO SET CMNT = 'Changed 4' WHERE ID = 4;
UPDATE NZ_DATA4EXTRACTOR.CDC_DEMO SET CMNT = 'But Once more 4' WHERE ID = 4;
UPDATE NZ_DATA4EXTRACTOR.CDC_DEMO SET CMNT = 'Up to date 5' WHERE ID = 5;
UPDATE NZ_DATA4EXTRACTOR.CDC_DEMO SET CMNT = 'Up to date Eight' WHERE ID = 8;

DELETE FROM NZ_DATA4EXTRACTOR.CDC_DEMO WHERE ID = 5;

SELECT * FROM NZ_DATA4EXTRACTOR.CDC_DEMO
    ORDER BY ID;

The Netezza desk incorporates the next rows:

ID|TS                     |CMNT            |
--+-----------------------+----------------+
 2|2023-03-07 14:05:11.000|Up to date Two     |
 3|2023-03-07 14:05:11.000|Three           |
 4|2023-03-07 14:05:11.000|But Once more 4  |
 6|2023-03-07 14:07:09.000|Six             |
 8|2023-03-07 14:07:10.000|Up to date Eight   |

  1. AWS SCT detects the modifications as earlier than utilizing the brand new transaction ID:
SET show_deleted_records = true;

SELECT MAX(CREATEXID) AS CREATEXID
    FROM NZ_DATA4EXTRACTOR.CDC_DEMO
    WHERE createxid > 2691824

CREATEXID|
---------+
  2691872|

SELECT
    ID,
    TS,
    CMNT,
    deletexid_
FROM (
    SELECT
        createxid,
        rowid,
        deletexid,
        2691824 AS min_CDC_trx,
        2691872 AS max_CDC_trx,
        CASE WHEN deletexid > max_CDC_trx
            THEN 0
            ELSE deletexid
            END AS deletexid_,
        MIN(createxid) OVER (PARTITION BY rowid) AS min_trx,
        COUNT(1) OVER (PARTITION BY rowid) AS rowid_cnt,
        ID,
        TS,
        CMNT
    FROM NZ_DATA4EXTRACTOR.CDC_DEMO AS t
    WHERE deletexid <> 1
        AND (CREATEXID > min_CDC_trx OR deletexid_ > min_CDC_trx) -- Prior run max trx
        AND CREATEXID <= max_CDC_trx -- Present run max trx
    ) AS r
WHERE (min_trx = createxid OR deletexid_ = 0)
    AND NOT (
        CREATEXID > min_CDC_trx 
        AND deletexid <= max_CDC_trx 
        AND rowid_cnt = 1 
        AND deletexid > 0
        )

The extracted modifications seem as follows:

ID|TS                     |CMNT          |DELETEXID_|
--+-----------------------+--------------+----------+
 1|2023-03-07 14:05:11.000|One           |   2691856|
 2|2023-03-07 14:05:11.000|Two           |   2691860|
 2|2023-03-07 14:05:11.000|Up to date Two   |         0|
 4|2023-03-07 14:05:11.000|4          |   2691862|
 4|2023-03-07 14:05:11.000|But Once more 4|         0|
 5|2023-03-07 14:05:11.000|5          |   2691868|
 6|2023-03-07 14:07:09.000|Six           |         0|
 8|2023-03-07 14:07:10.000|Eight         |   2691870|
 8|2023-03-07 14:07:10.000|Up to date Eight |         0|

Discover that we inserted a brand new row with ID 7 after which deleted this row. Due to this fact, we will ignore the row with ID 7 in our delta.

Additionally, we made a number of updates of the row with ID 4. In our delta, we embody the unique and the newest variations of the row. We ignore all intermediate variations in our delta.

We up to date the row with ID 5 after which deleted this row. We don’t embody the up to date row in our delta.

This manner, AWS SCT optimizes the migrated information, lowering the migration time and the community site visitors.

  1. Now, as earlier than, AWS SCT compresses, uploads to Amazon S3, and copies the info into the staging Redshift desk:
TRUNCATE TABLE "nz_data4extractor_nz_data4extractor"."_cdc_unit";

COPY "nz_data4extractor_nz_data4extractor"."_cdc_unit" 
    ("id", "ts", "cmnt", "deletexid_")
    FROM 's3://bucket/folder/unit_2.manifest' MANIFEST
    CREDENTIALS '...'
    REGION '...'
    REMOVEQUOTES
    IGNOREHEADER 1
    GZIP
    DELIMITER '|';  

  1. Then, AWS SCT applies the modifications to the goal desk. AWS SCT removes the deleted rows, removes the outdated model of up to date rows, after which inserts new rows and the newest model of any up to date rows:
DELETE 
    FROM  "nz_data4extractor_nz_data4extractor"."cdc_demo"
    USING "nz_data4extractor_nz_data4extractor"."_cdc_unit" t2
    WHERE "nz_data4extractor_nz_data4extractor"."cdc_demo"."id" = t2."id"
        AND COALESCE(CAST("nz_data4extractor_nz_data4extractor"."cdc_demo"."ts" AS VARCHAR),'?#-*') = COALESCE(CAST(t2."ts" AS VARCHAR),'?#-*')
        AND COALESCE(CAST("nz_data4extractor_nz_data4extractor"."cdc_demo"."cmnt" AS VARCHAR),'?#-*') = COALESCE(CAST(t2."cmnt" AS VARCHAR),'?#-*')
        AND t2.deletexid_ > 0;
  
INSERT INTO "nz_data4extractor_nz_data4extractor"."cdc_demo"("id", "ts", "cmnt")
SELECT 
    "id", 
    "ts", 
    "cmnt"
FROM "nz_data4extractor_nz_data4extractor"."_cdc_unit" t2
WHERE t2.deletexid_ = 0;

  1. You’ll be able to evaluate the info on the supply and goal to confirm that AWS SCT captured all modifications accurately:
SELECT * FROM NZ_DATA4EXTRACTOR.CDC_DEMO
    ORDER BY ID;

ID|TS                     |CMNT            |
--+-----------------------+----------------+
 2|2023-03-07 14:05:11.000|Up to date Two     |
 3|2023-03-07 14:05:11.000|Three           |
 4|2023-03-07 14:05:11.000|But Once more 4  |
 6|2023-03-07 14:07:09.000|Six             |
 8|2023-03-07 14:07:10.000|Up to date Eight   |

The information on Amazon Redshift matches precisely:

id|ts                        |cmnt
 2|2023-03-07 14:05:11.000000|Up to date Two
 3|2023-03-07 14:05:11.000000|Three
 4|2023-03-07 14:05:11.000000|But Once more 4
 6|2023-03-07 14:07:09.000000|Six
 8|2023-03-07 14:07:10.000000|Up to date Eight

Within the earlier examples, we confirmed find out how to run full load and CDC duties. You can even create a CDC migration activity with out the total load. The method is similar—you present AWS SCT with the transaction ID to begin the replication from.

The CDC course of doesn’t have a big influence on the supply facet. AWS SCT runs solely SELECT statements there, utilizing the transaction ID as boundaries for the WHERE clause. The efficiency influence of those statements is all the time smaller than the influence of DML statements generated by buyer’s purposes. For machines the place AWS SCT information extraction brokers are working, the CDC-related workload is all the time smaller than the total load workload as a result of the quantity of transferred information is smaller.

On the goal facet, for Amazon Redshift, the CDC course of can generate appreciable extra workload. The reason being that this course of points INSERT and DELETE statements and these can lead to overhead for MPP techniques, which Amazon Redshift is. Seek advice from High 10 efficiency tuning strategies for Amazon Redshift to seek out greatest practices and recommendations on find out how to increase efficiency of your Redshift cluster.

Conclusion

On this publish, we confirmed find out how to configure ongoing information replication for Netezza database migration to Amazon Redshift. You should utilize the described method to automate information migration and replication out of your IBM Netezza database to Amazon Redshift. Or, when you’re contemplating a migration of your current Netezza workloads to Amazon Redshift, you need to use AWS SCT to routinely convert your database schemas and migrate information. Obtain the newest model of AWS SCT and provides it a attempt!

We’re pleased to share these updates that can assist you in your information warehouse migration tasks. Within the meantime, you possibly can study extra about Amazon Redshift and AWS SCT. Blissful migrating!


Concerning the Authors

Mykhailo Kondak is a Database Engineer within the AWS Database Migration Service group at AWS. He makes use of his expertise with completely different database applied sciences to assist Amazon prospects to maneuver their on-premises information warehouses and large information workloads to the AWS Cloud. In his spare time, he performs soccer.

Illia Kravtsov is a Database Engineer on the AWS Database Migration Service group. He has over 10 years of expertise in information warehouse growth with Teradata and different massively parallel processing (MPP) databases.

Michael Soo is a Principal Database Engineer within the AWS Database Migration Service. He builds services that assist prospects migrate their database workloads to the AWS Cloud.



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments