Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it easy to arrange and function end-to-end information pipelines within the cloud. Trusted throughout varied industries, Amazon MWAA helps organizations like Siemens, ENGIE, and Selection Inns Worldwide improve and scale their enterprise workflows, whereas considerably enhancing safety and lowering infrastructure administration overhead.
Right this moment, we’re asserting the supply of Apache Airflow model 2.6.3 environments. When you’re at the moment operating Apache Airflow model 2.x, you’ll be able to seamlessly improve to v2.6.3 utilizing in-place model upgrades, thereby retaining your workflow run historical past and setting configurations.
On this submit, we delve into among the new options and capabilities of Apache Airflow v2.6.3 and how one can arrange or improve your Amazon MWAA setting to accommodate this model as you orchestrate your workflows within the cloud at scale.
New function: Notifiers
Airflow now offers you an environment friendly strategy to create reusable and standardized notifications to deal with systemic errors and failures. Notifiers introduce a brand new object in Airflow, designed to be an extensible layer for including notifications to DAGs. This framework can ship messages to exterior techniques when a job occasion or a person DAG run modifications its state. You may construct notification logic from a brand new base object and name it immediately out of your DAG recordsdata. The BaseNotifier
is an summary class that gives a primary construction for sending notifications in Airflow utilizing the assorted on_*__callback
. It’s meant for suppliers to increase and customise this for his or her particular wants.
Utilizing this framework, you’ll be able to construct customized notification logic immediately inside your DAG recordsdata. As an example, notifications will be despatched by way of e mail, Slack, or Amazon Easy Notification Service (Amazon SNS) based mostly on the state of a DAG (on_failure
, on_success
, and so forth). You too can create your personal customized notifier that updates an API or posts a file to your storage system of alternative.
For particulars on learn how to create and use a notifier, seek advice from Making a notifier.
New function: Managing duties caught in a queued state
Apache Airflow v2.6.3 brings a major enchancment to deal with the long-standing difficulty of duties getting caught within the queued state when utilizing the CeleryExecutor
. In a typical Apache Airflow workflow, duties progress by way of a lifecycle, shifting from the scheduled state to the queued state, and ultimately to the operating state. Nonetheless, duties can often stay within the queued state longer than anticipated because of communication points among the many scheduler, the executor, and the employee. In Amazon MWAA, clients have skilled such duties being queued for as much as 12 hours because of the way it makes use of the native integration of Amazon Easy Queue Service (Amazon SQS) with CeleryExecutor
.
To mitigate this difficulty, Apache Airflow v2.6.3 launched a mechanism that checks the Airflow database for duties which have remained within the queued state past a specified timeout, defaulting to 600 seconds. This default will be modified utilizing the setting configuration parameter scheduler.task_queued_timeout
. The system then retries such duties if retries are nonetheless accessible or fails them in any other case, guaranteeing that your information pipelines proceed to operate easily.
Notably, this replace deprecates the beforehand used celery.stalled_task_timeout
and celery.task_adoption_timeout
settings, and consolidates their functionalities right into a single configuration, scheduler.task_queued_timeout
. This allows simpler administration of duties that stay within the queued state. Operators can even configure scheduler.task_queued_timeout_check_interval
, which controls how incessantly the system checks for duties which have stayed within the queued state past the outlined timeout.
For particulars on learn how to use task_queued_timeout
, seek advice from the official Airflow documentation.
New function: A brand new steady timetable and help for steady schedule
With prior variations of Airflow, to run a DAG constantly in a loop, you had to make use of the TriggerDagRunOperator to rerun the DAG after the final job is completed. With Apache Airflow v2.6.3, now you can run a DAG constantly with a predefined timetable. The simplifies scheduling for continuous DAG runs. The brand new ContinuousTimetable assemble will create one steady DAG run, respecting start_date
and end_date
, with the brand new run beginning as quickly because the earlier run has accomplished, no matter whether or not the earlier run has succeeded or failed. Utilizing a steady timetable is very helpful when sensors are used to attend for extremely irregular occasions from exterior information instruments.
You may sure the diploma of parallelism to make sure that just one DAG is operating at any given time with the max_active_runs
parameter:
New function: Set off the DAG UI extension with versatile person kind idea
Previous to Apache Airflow v2.6.3, you possibly can present parameters in JSON construction by way of the Airflow UI for customized workflow runs. You needed to mannequin, examine, and perceive the JSON and enter parameters manually with out the choice to validate them earlier than triggering a workflow. With Apache Airflow v2.6.3, whenever you select Set off DAG w/ config, a set off UI kind is rendered based mostly on the predefined DAG Params. To your advert hoc, testing, or customized runs, this simplifies the DAG’s parameter entry. If the DAG has no parameters outlined, a JSON entry masks is proven. The shape parts will be outlined with the Param
class and attributes outline how a kind area is displayed.
For an instance DAG the next kind is generated by DAG Params.
Set Up a New Apache Airflow v2.6.3 Setting
You may arrange a brand new Apache Airflow v2.6.3 setting in your account and most popular Area utilizing the AWS Administration Console, API, or AWS Command Line Interface (AWS CLI). When you’re adopting infrastructure as code (IaC), you’ll be able to automate the setup utilizing both AWS CloudFormation, the AWS Cloud Improvement Package (AWS CDK), or Terraform scripts.
When you might have efficiently created an Apache Airflow v2.6.3 setting in Amazon MWAA, the next packages are routinely put in on the scheduler and employee nodes together with different supplier packages:
For an entire listing of supplier packages put in, seek advice from Apache Airflow supplier packages put in on Amazon MWAA environments.
Improve from older variations of Apache Airflow to Apache Airflow v2.6.3
You may carry out in-place model upgrades of your present Amazon MWAA environments to replace your older Apache Airflow v2.x-based environments to v2.6.3. To be taught extra about in-place model upgrades, seek advice from Upgrading the Apache Airflow model or Introducing in-place model upgrades with Amazon MWAA.
Conclusion
On this submit, we talked about among the new options of Apache Airflow v2.6.3 and how one can get began utilizing them in Amazon MWAA. Check out these new options like notifiers and steady timetables, and different enhancements to enhance your information orchestration pipelines.
For extra particulars and code examples on Amazon MWAA, go to the Amazon MWAA Consumer Information and the Amazon MWAA examples GitHub repo.
Apache, Apache Airflow, and Airflow are both registered logos or logos of the Apache Software program Basis in the USA and/or different nations.
Concerning the Authors
Hernan Garcia is a Senior Options Architect at AWS, based mostly out of Amsterdam, working within the Monetary Providers Business since 2018. He focuses on software modernization and helps his clients within the adoption of cloud working fashions and serverless applied sciences.
Parnab Basak is a Options Architect and a Serverless Specialist at AWS. He focuses on creating new options which might be cloud native utilizing trendy software program growth practices like serverless, DevOps, and analytics. Parnab works intently within the analytics and integration providers house serving to clients undertake AWS providers for his or her workflow orchestration wants.
Shubham Mehta is an skilled product supervisor with over eight years of expertise and a confirmed monitor document of delivering profitable merchandise. In his present position as a Senior Product Supervisor at AWS, he oversees Amazon Managed Workflows for Apache Airflow (Amazon MWAA) and spearheads the Apache Airflow open-source contributions to additional improve the product’s performance.