What’s new in Amazon Redshift – 2022, a yr in evaluation

January 19, 2023

1

In 2021 and 2020, we advised you concerning the new options in Amazon Redshift that make it simpler, quicker, and cheaper to research all of your information and discover wealthy and highly effective insights. In 2022, we’re completely happy to report that the Amazon Redshift staff was arduous at work. We labored backward from buyer necessities and introduced a number of new options to make it simpler, quicker, and cheaper to research all of your information. This submit covers a few of these new options.

At AWS, for information and analytics, our technique is to offer you a trendy information structure that helps you break away from information silos; have purpose-built information, analytics, machine studying (ML), and synthetic intelligence companies to make use of the proper device for the proper job; and have open, ruled, safe, and totally managed companies to make analytics obtainable to everybody. Inside AWS’s trendy information structure, Amazon Redshift because the cloud information warehouse stays a key part, enabling you to run complicated SQL analytics at scale and efficiency on terabytes to petabytes of structured and unstructured information, and make the insights broadly obtainable by means of in style enterprise intelligence (BI) and analytics instruments. We proceed to work backward from prospects’ necessities, and in 2022 launched over 40 options in Amazon Redshift to assist prospects with their high information warehousing use instances, together with:

Self-service analytics
Simple information ingestion
Knowledge sharing and collaboration
Knowledge science and machine studying
Safe and dependable analytics
Greatest worth efficiency analytics

Let’s dive deeper and focus on the brand new Amazon Redshift options in these areas.

Self-service analytics

Prospects proceed to inform us that information and analytics is changing into ubiquitous, and everybody of their group wants analytics. We introduced Amazon Redshift Serverless (in preview) in 2021 to make it simple to run and scale analytics in seconds with out having to provision and handle information warehouse infrastructure. In July 2022, we introduced the normal availability of Redshift Serverless, and since then hundreds of shoppers, together with Peloton, Broadridge Financials, and NextGen Healthcare, have used it to rapidly and simply analyze their information. Amazon Redshift Serverless mechanically provisions and intelligently scales information warehouse capability to ship excessive efficiency for all of your analytics, and also you solely pay for the compute used throughout the workloads on a per-second foundation. Since GA, we’ve got added options like useful resource tagging, simplified monitoring, and availability in further AWS Areas to additional simplify billing and increase the attain throughout extra Areas worldwide.

In 2021, we launched Amazon Redshift Question Editor V2, which is a free web-based device for information analysts, information scientists, and builders to discover, analyze, and collaborate on information in Amazon Redshift information warehouses and information lakes. In 2022, Question Editor V2 acquired further enhancements similar to pocket book assist for improved collaboration to creator, arrange, and annotate queries; person entry by means of identification supplier (IdP) credentials for single sign-on; and the flexibility to run a number of queries concurrently to enhance developer productiveness.

Autonomics is one other space the place we’re actively working to make use of ML-based optimizations and provides prospects a self-learning and self-optimizing information warehouse. In 2022, we introduced the final availability of Automated Materialized Views (AutoMVs) to enhance the efficiency of queries (scale back the entire runtime) with none person effort by mechanically creating and sustaining materialized views. AutoMVs, mixed with computerized refresh, incremental refresh, and computerized question rewriting for materialized views, made materialized views upkeep free, supplying you with quicker efficiency mechanically. As well as, the computerized desk optimization (ATO) functionality for schema optimization and computerized workload administration (auto WLM) functionality for workload optimization acquired additional enhancements for higher question efficiency.

Simple information ingestion

Prospects inform us that they’ve their information distributed over a number of information sources like transactional databases, information warehouses, information lakes, and large information techniques. They need the flexibleness to combine this information with no-code/low-code, zero-ETL information pipelines or analyze this information in place with out shifting it. Prospects inform us that their present information pipelines are complicated, handbook, inflexible, and sluggish, leading to incomplete, inconsistent, and off views of information, limiting insights. Prospects have requested us for a greater method ahead, and we’re happy to announce plenty of new capabilities to simplify and automate information pipelines.

Amazon Aurora zero-ETL integration with Amazon Redshift (preview) lets you run near-real-time analytics and ML on petabytes of transactional information. It provides a no-code resolution for making transactional information from a number of Amazon Aurora databases obtainable in Amazon Redshift information warehouses inside seconds of being written to Aurora, eliminating the necessity to construct and preserve complicated information pipelines. With this characteristic, Aurora prospects may entry Amazon Redshift capabilities similar to complicated SQL analytics, built-in ML, information sharing, and federated entry to a number of information shops and information lakes. This characteristic is now obtainable in preview for Amazon Aurora MySQL-Appropriate Version model 3 (with MySQL 8.0 compatibility), and you may request entry to the preview.

Amazon Redshift now helps auto-copy from Amazon S3 (preview) to simplify information loading from Amazon Easy Storage Service (Amazon S3) into Amazon Redshift. Now you can arrange steady file ingestion guidelines (copy jobs) to trace your Amazon S3 paths and mechanically load new information with out the necessity for added instruments or customized options. Copy jobs may be monitored by means of system tables, they usually mechanically preserve monitor of beforehand loaded information and exclude them from the ingestion course of to stop information duplication. This characteristic is now obtainable in preview; you’ll be able to do this characteristic by creating a brand new cluster utilizing the preview monitor.

Prospects proceed to inform us that they want instantaneous, in-the-moment, real-time analytics, and we’re happy to announce the normal availability of streaming ingestion assist in Amazon Redshift for Amazon Kinesis Knowledge Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK). This characteristic eliminates the necessity to stage streaming information in Amazon S3 earlier than ingesting it into Amazon Redshift, enabling you to attain low latency, measured in seconds, whereas ingesting tons of of megabytes of streaming information per second into your information warehouses. You should use SQL inside Amazon Redshift to connect with and instantly ingest information from a number of Kinesis information streams or MSK subjects, create auto-refreshing streaming materialized views with transformations on high of streams on to entry streaming information, and mix real-time information with historic information for higher insights. For instance, Adobe has built-in Amazon Redshift streaming ingestion as a part of their Adobe Expertise Platform for ingesting and analyzing, in actual time, the net and purposes clickstream and session information for varied purposes like CRM and buyer assist purposes.

Prospects have advised us that they need easy, out-of-the-box integration between Amazon Redshift, BI and ETL (extract, rework, and cargo) instruments, and enterprise purposes like Salesforce and Marketo. We’re happy to announce the final availability of Informatica Knowledge Loader for Amazon Redshift, which lets you use Informatica Knowledge Loader for high-speed and high-volume information loading into Amazon Redshift at no cost. You may merely choose the Informatica Knowledge Loader choice on the Amazon Redshift console. As soon as in Informatica Knowledge Loader, you’ll be able to connect with sources similar to Salesforce or Marketo, select Amazon Redshift as a goal, and start to load your information.

Knowledge sharing and collaboration

Prospects proceed to inform us that they need to analyze all their first-party and third-party information and make the wealthy data-driven insights obtainable to their prospects, companions, and suppliers. We launched new options in 2021, similar to Knowledge Sharing and AWS Knowledge Trade integration, to make it simpler so that you can analyze all your information and share it inside and outdoors your organizations.

A terrific instance of a buyer utilizing information sharing is Orion. Orion offers real-time information as a service (DaaS) options for patrons within the monetary companies business, similar to wealth administration, asset administration, and funding administration suppliers. They’ve over 2,500 information sources which can be primarily SQL Server databases sitting each on premises and in AWS. Knowledge is streamed utilizing Kafka connecters into Amazon Redshift. They’ve a producer cluster that receives all this information after which makes use of Knowledge Sharing to share information in actual time for collaboration. It is a multi-tenant structure that serves a number of shoppers. Given the sensitivity of their information, information sharing is a method to offer workload isolation between clusters and in addition securely share that information to end-users.

In 2022, we continued to speculate on this space to enhance the efficiency, governance, and developer productiveness with new options to make it simpler, easier, and quicker to share and collaborate on information.

As prospects are constructing large-scale information sharing configurations, they’ve requested for simplified governance and safety for shared information, and we’re including centralized entry management with AWS Lake Formation for Amazon Redshift datashares to allow sharing stay information throughout a number of Amazon Redshift information warehouses. With this characteristic, Amazon Redshift now helps simplified governance of Amazon Redshift datashares by utilizing AWS Lake Formation as a single pane of glass to centrally handle information or permissions on datashares. You may view, modify, and audit permissions, together with row-level and column-level safety on the tables and views within the Amazon Redshift datashares, utilizing Lake Formation APIs and the AWS Administration Console, and permit the Amazon Redshift datashares to be found and consumed by different Amazon Redshift information warehouses.

Knowledge science and machine studying

Prospects proceed to inform us that they need their information and analytics techniques to assist them reply a variety of questions, from what is going on of their enterprise (descriptive analytics) to why is it occurring (diagnostic analytics) and what is going to occur sooner or later (predictive analytics). Amazon Redshift offers options like complicated SQL analytics, information lake analytics, and Amazon Redshift ML for patrons to research their information and uncover highly effective insights. Redshift ML integrates Amazon Redshift with Amazon SageMaker, a totally managed ML service, enabling you to create, prepare, and deploy ML fashions utilizing acquainted SQL instructions.

Prospects have additionally requested us for higher integration between Amazon Redshift and Apache Spark, so we’re excited to announce Amazon Redshift integration for Apache Spark to make information warehouses simply accessible for Spark-based purposes. Now, builders utilizing AWS analytics and ML companies similar to Amazon EMR, AWS Glue, and SageMaker can effortlessly construct Apache Spark purposes that learn from and write to their Amazon Redshift information warehouses. Amazon EMR and AWS Glue package deal the Redshift-Spark connector so you’ll be able to simply connect with your information warehouse out of your Spark-based purposes. You should use a number of pushdown capabilities for operations similar to kind, combination, restrict, be a part of, and scalar features in order that solely the related information is moved out of your Amazon Redshift information warehouse to the consuming Spark software. You may also make your purposes safer by using AWS Id and Entry Administration (IAM) credentials to connect with Amazon Redshift.

Safe and dependable analytics

Prospects proceed to inform us that their information warehouses are mission-critical techniques that want excessive availability, reliability, and safety. We launched plenty of new options in 2022 on this space.

Amazon Redshift now helps Multi-AZ deployments (in preview) for RA3 instance-based clusters, which permits operating your information warehouse in a number of AWS Availability Zones concurrently and steady operation in unexpected Availability Zone-wide failure situations. Multi-AZ assist is already obtainable for Redshift Serverless. An Amazon Redshift Multi-AZ deployment permits you to get well in case of Availability Zone failures with none person intervention. An Amazon Redshift Multi-AZ information warehouse is accessed as a single information warehouse with one endpoint, and helps you maximize efficiency by distributing workload processing throughout a number of Availability Zones mechanically. No software modifications are wanted to take care of enterprise continuity throughout unexpected outages.

In 2022, we launched options like role-based entry management, row-level safety, and information masking (in preview) to make it simpler so that you can handle entry and resolve who has entry to which information, together with obfuscating personally identifiable data (PII) like bank card numbers.

You should use role-based entry management (RBAC) to manage end-user entry to information at a broad or granular degree based mostly on an end-user’s job position and permissions. With RBAC, you’ll be able to create a job utilizing SQL, grant a group of granular permissions to the position, after which assign that position to end-users. Roles may be granted object-level, column-level, and system-level permissions. Moreover, RBAC introduces out-of-box system roles for DBAs, operators, safety admins, or personalized roles.

Row-level safety (RLS) simplifies design and implementation of fine-grained entry to the rows in tables. With RLS, you’ll be able to limit entry to a subset of rows inside a desk based mostly on the customers’ job position or permissions with SQL.

Amazon Redshift assist for dynamic information masking (DDM), which is now obtainable in preview, permits you to simplify defending PII similar to Social Safety numbers, credit card numbers, and cellphone numbers in your Amazon Redshift information warehouse. With dynamic information masking, you management entry to your information by means of easy SQL-based masking insurance policies that decide how Amazon Redshift returns delicate information to the person at question time. You may create masking insurance policies to outline constant, format-preserving, and irreversible masked information values. You may apply a masking coverage on a selected column or record of columns in a desk. Additionally, you have got the flexibleness of selecting find out how to present the masked information. For instance, you’ll be able to fully cover the info, substitute partial actual values with wildcard characters, or outline your personal strategy to masks the info utilizing SQL expressions, Python, or AWS Lambda user-defined features. Moreover, you’ll be able to apply a conditional masking coverage based mostly on different columns, which selectively protects the column information in a desk based mostly on the values in a number of totally different columns.

We additionally introduced enhancements to audit logging, native integration with Microsoft Azure Lively Listing, and assist for default IAM roles in further Areas to additional simplify safety administration.

Greatest worth efficiency analytics

Prospects proceed to inform us that they want quick and cost-effective information warehouses that ship excessive efficiency at any scale whereas retaining prices low. From day 1 since Amazon Redshift’s launch in 2012, we’ve got taken a data-driven method and used fleet telemetry to construct a cloud information warehouse service that provides you one of the best worth efficiency at any scale. Over time, we’ve got developed Amazon Redshift’s structure and launched options similar to Redshift Managed Storage (RMS) for separation of storage and compute, Amazon Redshift Spectrum for information lake queries, computerized desk optimization for bodily schema optimization, computerized workload administration to prioritize workloads and allocate the proper compute and reminiscence, cluster resize to scale compute and storage vertically, and concurrency scaling to dynamically scale compute out or in. Our efficiency benchmarks proceed to exhibit Amazon Redshift’s worth efficiency management.

In 2022, we added new options similar to the final availability of concurrency scaling for write operations like COPY, INSERT, UPDATE, and DELETE to assist nearly limitless concurrent customers and queries. We additionally launched efficiency enhancements for string-based information processing by means of vectorized scans over light-weight, CPU-efficient, dictionary-encoded string columns, which permits the database engine to function instantly over compressed information.

We additionally added assist for SQL operators similar to MERGE (single operator for inserts or updates); CONNECY_BY (for hierarchical queries); GROUPING SETS, ROLLUP, and CUBE (for multi-dimensional reporting); and elevated the scale of the SUPER information sort to 16 MB to make it simpler so that you can migrate from legacy information warehouses to Amazon Redshift.

Conclusion

Our prospects proceed to inform us that information and analytics stays a high precedence for them and the necessity to cost-effectively extract extra enterprise worth from their information throughout these occasions is extra pronounced than every other time up to now. Amazon Redshift as your cloud information warehouse lets you run complicated SQL analytics with scale and efficiency on terabytes to petabytes of structured and unstructured information and make the insights broadly obtainable by means of in style BI and analytics instruments.

Though we launched over 40 options in 2022 and the tempo of innovation continues to speed up, it stays day 1 and we sit up for listening to from you on how these options assist you unlock extra worth to your organizations. We invite you to attempt these new options and get in contact with us by means of your AWS account staff when you’ve got additional feedback.

Concerning the creator

Manan Goel is a Product Go-To-Market Chief for AWS Analytics Providers together with Amazon Redshift at AWS. He has greater than 25 years of expertise and is effectively versed with databases, information warehousing, enterprise intelligence, and analytics. Manan holds a MBA from Duke College and a BS in Electronics & Communications engineering.