Monday, February 13, 2023
HomeBig DataSynchronize your Salesforce and Snowflake information to hurry up your time to...

Synchronize your Salesforce and Snowflake information to hurry up your time to perception with Amazon AppFlow


This publish was co-written with Amit Shah, Principal Advisor at Atos.

Prospects throughout industries search significant insights from the information captured of their Buyer Relationship Administration (CRM) techniques. To realize this, they mix their CRM information with a wealth of knowledge already out there of their information warehouse, enterprise techniques, or different software program as a service (SaaS) functions. One extensively used method is getting the CRM information into your information warehouse and preserving it updated via frequent information synchronization.

Integrating third-party SaaS functions is usually sophisticated and requires vital effort and improvement. Builders want to know the appliance APIs, write implementation and take a look at code, and preserve the code for future API adjustments. Amazon AppFlow, which is a low-code/no-code AWS service, addresses this problem.

Amazon AppFlow is a completely managed integration service that lets you securely switch information between SaaS functions, like Salesforce, SAP, Zendesk, Slack, and ServiceNow, and AWS providers like Amazon Easy Storage Service (Amazon S3) and Amazon Redshift in only a few clicks. With Amazon AppFlow, you possibly can run information flows at enterprise scale on the frequency you select—on a schedule, in response to a enterprise occasion, or on demand.

On this publish, we give attention to synchronizing your information from Salesforce to Snowflake (on AWS) with out writing code. This publish walks you thru the steps to arrange an information movement to deal with full and incremental information load utilizing an instance use case.

Resolution overview

Our use case entails the synchronization of the Account object from Salesforce into Snowflake. On this structure, you employ Amazon AppFlow to filter and switch the information to your Snowflake information warehouse.

You’ll be able to configure Amazon AppFlow to run your information ingestion in three alternative ways:

  • On-demand – You’ll be able to manually run the movement via the AWS Administration Console, API, or SDK name.
  • Occasion-driven – Amazon AppFlow can subscribe and hear to vary information seize (CDC) occasions from the supply SaaS software.
  • Scheduled – Amazon AppFlow can run schedule-triggered flows primarily based on a pre-defined schedule rule. With scheduled flows, you possibly can select both full or incremental information switch:
    • With full switch, Amazon AppFlow transfers a snapshot of all data on the time of the movement run from the supply to the vacation spot.
    • With incremental switch, Amazon AppFlow transfers solely the data which were added or modified because the final profitable movement run. To find out the incremental delta of your information, AppFlow requires you to specify a supply timestamp area to instruct how Amazon AppFlow identifies new or up to date data.

We use the on-demand set off for the preliminary load of knowledge from Salesforce to Snowflake, as a result of it helps you pull all of the data, no matter their creation. To then synchronize information periodically with Snowflake, after we run the on-demand set off, we configure a scheduled set off with incremental switch. With this method, Amazon AppFlow pulls the data primarily based on a selected timestamp area from the Salesforce Account object periodically, primarily based on the time interval specified within the movement.

The Account_Staging desk is created in Snowflake to behave as a short lived storage that can be utilized to determine the information change occasions. Then the everlasting desk (Account) is up to date from the staging desk by working a SQL saved process that incorporates the incremental replace logic. The next determine depicts the assorted elements of the structure and the information movement from the supply to the goal.

The info movement incorporates the next steps:

  1. First, the movement is run with on-demand and full switch mode to load the total information into Snowflake.
  2. The Amazon AppFlow Salesforce connector pulls the information from Salesforce and shops it within the Account Knowledge S3 bucket in CSV format.
  3. The Amazon AppFlow Snowflake connector hundreds the information into the Account_Staging desk.
  4. A scheduled process, working at common intervals in Snowflake, triggers a saved process.
  5. The saved process begins an atomic transaction that hundreds the information into the Account desk after which deletes the information from the Account_Staging desk.
  6. After the preliminary information is loaded, you replace the movement to seize incremental updates from Salesforce. The movement set off configuration is modified to scheduled, to seize information adjustments in Salesforce. This permits Snowflake to get all updates, deletes, and inserts in Salesforce at configured intervals.
  7. The movement makes use of the configured LastModifiedDate area to find out incremental adjustments.
  8. Steps 3, 4, and 5 are run once more to load the incremental updates into the Snowflake Accounts desk.

Conditions

To get began, you want the next conditions:

  • A Salesforce person account with adequate privileges to put in linked apps. Amazon AppFlow makes use of a linked app to speak with Salesforce APIs. In case you don’t have a Salesforce account, you possibly can join a developer account.
  • A Snowflake account with adequate permissions to create and configure the combination, exterior stage, desk, saved procedures, and duties.
  • An AWS account with entry to AWS Identification and Entry Administration (IAM), Amazon AppFlow, and Amazon S3.

Arrange Snowflake configuration and Amazon S3 information

Full the next steps to configure Snowflake and arrange your information in Amazon S3:

  1. Create two S3 buckets in your AWS account: one for holding the information coming from Salesforce, and one other for holding error data.

A finest observe when creating your S3 bucket is to be sure you block public entry to the bucket to make sure your information just isn’t accessible by unauthorized customers.

  1. Create an IAM coverage named snowflake-access that permits itemizing the bucket contents and studying S3 objects contained in the bucket.

Comply with the directions for steps 1 and a couple of in Configuring a Snowflake Storage Integration to Entry Amazon S3 to create an IAM coverage and position. Exchange the placeholders together with your S3 bucket names.

  1. Log in to your Snowflake account and create a brand new warehouse referred to as SALESFORCE and database referred to as SALESTEST.
  2. Specify the format wherein information will likely be out there in Amazon S3 for Snowflake to load (for this publish, CSV):
USE DATABASE SALESTEST;
CREATE or REPLACE file format my_csv_format
sort = csv
field_delimiter=","
Y skip_header = 1
null_if = ('NULL', 'null')
empty_field_as_null = true
compression = gzip;

  1. Amazon AppFlow makes use of the Snowflake COPY command to maneuver information utilizing an S3 bucket. To configure this integration, comply with steps 3–6 in Configuring a Snowflake Storage Integration to Entry Amazon S3.

These steps create a storage integration together with your S3 bucket, replace IAM roles with Snowflake account and person particulars, and creates an exterior stage.

This completes the setup in Snowflake. Within the subsequent part, you create the required objects in Snowflake.

Create schemas and procedures in Snowflake

In your Snowflake account, full the next steps to create the tables, saved procedures, and duties for implementing the use case:

  1. In your Snowflake account, open a worksheet and run the next DDL scripts to create the Account and Account_staging tables:
CREATE or REPLACE TABLE ACCOUNT_STAGING (
ACCOUNT_NUMBER STRING NOT NULL,
ACCOUNT_NAME STRING,
ACCOUNT_TYPE STRING,
ANNUAL_REVENUE NUMBER,
ACTIVE BOOLEAN NOT NULL,
DELETED BOOLEAN,
LAST_MODIFIED_DATE STRING,
main key (ACCOUNT_NUMBER)
);

CREATE or REPLACE TABLE ACCOUNT (
ACCOUNT_NUMBER STRING NOT NULL,
ACCOUNT_NAME STRING,
ACCOUNT_TYPE STRING,
ANNUAL_REVENUE NUMBER,
ACTIVE BOOLEAN NOT NULL,
LAST_MODIFIED_DATE STRING,
main key (ACCOUNT_NUMBER)
);

  1. Create a saved process in Snowflake to load information from staging to the Account desk:
CREATE or REPLACE process sp_account_load( )
returns varchar not null
language sql
as
$$
start
Start transaction;
merge into ACCOUNT utilizing ACCOUNT_STAGING
on ACCOUNT.ACCOUNT_NUMBER = ACCOUNT_STAGING.ACCOUNT_NUMBER
when matched AND ACCOUNT_STAGING.DELETED=TRUE then delete
when matched then UPDATE SET
ACCOUNT.ACCOUNT_NAME = ACCOUNT_STAGING.ACCOUNT_NAME,
ACCOUNT.ACCOUNT_TYPE = ACCOUNT_STAGING.ACCOUNT_TYPE,
ACCOUNT.ANNUAL_REVENUE = ACCOUNT_STAGING.ANNUAL_REVENUE,
ACCOUNT.ACTIVE = ACCOUNT_STAGING.ACTIVE,
ACCOUNT.LAST_MODIFIED_DATE = ACCOUNT_STAGING.LAST_MODIFIED_DATE
when NOT matched then
INSERT (
ACCOUNT.ACCOUNT_NUMBER,
ACCOUNT.ACCOUNT_NAME,
ACCOUNT.ACCOUNT_TYPE,
ACCOUNT.ANNUAL_REVENUE,
ACCOUNT.ACTIVE,
ACCOUNT.LAST_MODIFIED_DATE
)
values(
ACCOUNT_STAGING.ACCOUNT_NUMBER,
ACCOUNT_STAGING.ACCOUNT_NAME,
ACCOUNT_STAGING.ACCOUNT_TYPE,
ACCOUNT_STAGING.ANNUAL_REVENUE,
ACCOUNT_STAGING.ACTIVE,
ACCOUNT_STAGING.LAST_MODIFIED_DATE
) ;

Delete from ACCOUNT_STAGING;
Commit;
finish;
$$
;

This saved process determines whether or not the information incorporates new data that should be inserted or current data that should be up to date or deleted. After a profitable run, the saved process clears any information out of your staging desk.

  1. Create a process in Snowflake to set off the saved process. Ensure that the time interval for this process is greater than the time interval configured in Amazon AppFlow for pulling the incremental adjustments from Salesforce. The time interval needs to be adequate for information to be processed.
CREATE OR REPLACE TASK TASK_ACCOUNT_LOAD
WAREHOUSE = SALESFORCE
SCHEDULE = 'USING CRON 5 * * * * America/Los_Angeles'
AS
name sp_account_load();

  1. Present the required permissions to run the duty and resume the duty:
  • As quickly as process is created it is going to be suspended state so must resume it manually first time
ALTER TASK TASK_ACCOUNT_LOAD RESUME;

  • If the position which is assigned to us doesn’t have correct entry to renew/execute process must grant execute process privilege to that position
GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE SYSADMIN;

This completes the Snowflake a part of configuration and setup.

Create a Salesforce connection

First, let’s create a Salesforce connection that can be utilized by AppFlow to authenticate and pull data out of your Salesforce occasion. On the AWS console, be sure you are in the identical Area the place your Snowflake occasion is working.

  1. On the Amazon AppFlow console, select Connections within the navigation pane.
  2. From the record of connectors, choose Salesforce.
  3. Select Create connection.
  4. For Connection identify, enter a reputation of your selection (for instance, Salesforce-blog).
  5. Depart the remainder of the fields as default and select Proceed.
  6. You’re redirected to a sign-in web page, the place you’ll want to log in to your Salesforce occasion.
  7. After you permit Amazon AppFlow entry to your Salesforce account, your connection is efficiently created.
           

 Create a Snowflake connection

Full the next steps to create your Snowflake connection:

  1. On the Connections menu, select Snowflake.
  2. Select Create connection.
  3. Present data for the Warehouse, Stage identify, and Bucket particulars fields.
  4. Enter your credential particulars.

  1. For Area, select the identical Area the place Snowflake is working.
  2. For Connection identify, identify your connection Snowflake-blog.
  3. Depart the remainder of the fields as default and select Join.

Create a movement in Amazon AppFlow

Now you create a movement in Amazon AppFlow to load the information from Salesforce to Snowflake. Full the next steps:

  1. On the Amazon AppFlow console, select Flows within the navigation pane.
  2. Select Create movement.
  3. On the Specify movement particulars web page, enter a reputation for the movement (for instance, AccountData-SalesforceToSnowflake).
  4. Optionally, present an outline for the movement and tags.
  5. Select Subsequent.

  1. On the Configure movement web page, for Supply identify¸ select Salesforce.
  2. Select the Salesforce connection we created within the earlier step (Salesforce-blog).
  3. For Select Salesforce object, select Account.
  4. For Vacation spot identify, select Snowflake.
  5. Select the newly created Snowflake connection.
  6. For Select Snowflake object, select the staging desk you created earlier (SALESTEST.PUBLIC. ACCOUNT_STAGING).

  1. Within the Error dealing with part, present your error S3 bucket.
  2. For Select the best way to set off the movement¸ choose Run on demand.
  3. Select Subsequent.

  1. Choose Manually map fields to map the fields between your supply and vacation spot.
  2. Select the fields Account Quantity, Account Title, Account Sort, Annual Income, Energetic, Deleted, and Final Modified Date.

  1. Map every supply area to its corresponding vacation spot area.
  2. Below Further settings, depart the Import deleted data unchecked (default setting).

  1. Within the Validations part, add validations for the information you’re pulling from Salesforce.

As a result of the schema for the Account_Staging desk in Snowflake database has a NOT NULL constraint for the fields Account_Number and Energetic, data containing a null worth for these fields needs to be ignored.

  1. Select Add Validation to configure validations for these fields.
  2. Select Subsequent.

  1. Depart the whole lot else as default, proceed to the ultimate web page, and select Create Circulate.
  2. After the movement is created, select Run movement.

When the movement run completes efficiently, it should convey all data into your Snowflake staging desk.

Confirm information in Snowflake

The info will likely be loaded into the Account_staging desk. To confirm that information is loaded in Snowflake, full the next steps:

  1. Validate the variety of data by querying the ACCOUNT_STAGING desk in Snowflake.
  2. Wait in your Snowflake process to run primarily based on the configured schedule.
  3. Confirm that each one the information is transferred to the ACCOUNT desk and the ACCOUNT_STAGING desk is truncated.

Configure an incremental information load from Salesforce

Now let’s configure an incremental information load from Salesforce:

  1. On the Amazon AppFlow console, choose your movement, and select Edit.
  2. Go to the Edit configuration step and alter to Run movement on schedule.
  3. Set the movement to run each 5 minutes, and supply a begin date of Right this moment, with a begin time sooner or later.
  4. Select Incremental switch and select the LastModifiedDate area.
  5. Select Subsequent.
  6. Within the Further settings part, choose Import deleted data.

This ensures that deleted data from the supply are additionally ingested.

  1. Select Save after which select Activate movement.

Now your movement is configured to seize all incremental adjustments.

Take a look at the answer

Log in to your Salesforce account, and edit any file within the Account object.

Inside 5 minutes or much less, a scheduled movement will choose up your change and write the modified file into your Snowflake staging desk and set off the synchronization course of.

You’ll be able to see the small print of the run, together with variety of data transferred, on the Run Historical past tab of your movement.

Clear up

Clear up the assets in your AWS account by finishing the next steps:

  1. On the Amazon AppFlow console, select Flows within the navigation pane.
  2. From the record of flows, choose the movement AccountData-SalesforceToSnowflakeand delete it.
  3. Enter delete to delete the movement.
  4. Select Connections within the navigation pane.
  5. Select Salesforce from the record of connectors, choose Salesforce-blog, and delete it.
  6. Enter delete to delete the connector.
  7. On the Connections web page, select Snowflake from the record of connectors, choose Snowflake-blog, and delete it.
  8. Enter delete to delete the connector.
  9. On the IAM console, select Roles within the navigation web page, then choose the position you created for Snowflake and delete it.
  10. Select Insurance policies within the navigation pane, choose the coverage you created for Snowflake, and delete it.
  11. On the Amazon S3 console, seek for the information bucket you created, select Empty to delete the objects, then delete the bucket.
  12. Seek for the error bucket you created, select Empty to delete the objects, then delete the bucket.
  13. Clear up assets in your Snowflake account:
  • Delete the duty TASK_ACCOUNT_LOAD:
ALTER TASK TASK_ACCOUNT_LOAD SUSPEND;
DROP TASK TASK_ACCOUNT_LOAD;

  • Delete the saved process sp_account_load:
DROP process sp_account_load();

  • Delete the tables ACCOUNT_STAGING and ACCOUNT:
DROP TABLE ACCOUNT_STAGING;
DROP TABLE ACCOUNT;

Conclusion

On this publish, we walked you thru the best way to combine and synchronize your information from Salesforce to Snowflake utilizing Amazon AppFlow. This demonstrates how one can arrange your ETL jobs with out having to study new programming languages by utilizing Amazon AppFlow and your acquainted SQL language. This can be a proof of idea, however you possibly can attempt to deal with edge instances like failure of Snowflake duties or perceive how incremental switch works by making a number of adjustments to a Salesforce file throughout the scheduled time interval.

For extra data on Amazon AppFlow, go to Amazon AppFlow.


Concerning the authors

Ramesh Ranganathan is a Senior Companion Resolution Architect at AWS. He works with AWS prospects and companions to offer steering on enterprise cloud adoption, software modernization and cloud native improvement. He’s obsessed with know-how and enjoys experimenting with AWS Serverless providers.

Kamen Sharlandjiev is an Analytics Specialist Options Architect and Amazon AppFlow knowledgeable. He’s on a mission to make life simpler for purchasers who’re dealing with advanced information integration challenges. His secret weapon? Absolutely managed, low-code AWS providers that may get the job completed with minimal effort and no coding.

Amit Shah is a cloud primarily based fashionable information structure knowledgeable and presently main AWS Knowledge Analytics observe in Atos. Primarily based in Pune in India, he has 20+ years of expertise in information technique, structure, design and improvement. He’s on a mission to assist group change into data-driven.



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments