Skip to content

Debug an Airflow DAG

Supademo

See how DataMates uses AI to quickly debug Airflow DAGs. By combining platform integrations and best-practice templates from the Knowledge Hub, DataMates identifies issues, suggests fixes, and applies improvements automatically. Shared memories across DataMates boost context and consistency for faster, smarter troubleshooting.

Walkthrough

The above demo goes over the following steps to improve the debugging process. By creating a DataMate - engineers can work in their IDE without content switching or pouring over airflow log files.

Creating a DataMate: The presentation begins by showing how to create a DataMate with the necessary integrations. The example uses an "Airflow tester" DataMate configured with Memory, Airflow, GitHub, Databricks, and Jira integrations.

Adding Context from Knowledge Hub: Altimate's Knowledge Hub is introduced as a way to provide context to agents. It offers templates that users can fork and modify to fit their company's best practices. The "Airflow Cookbook" from the Knowledge Hub is used in this example to help debug Airflow DAGs. The URL of the knowledge document can be copied and provided to the coding assistant for context.

Debugging an Airflow DAG: A problematic Airflow DAG named asset1_producer is shown to have failed runs, indicated by red "X failed" statuses. The bug is identified as a ZeroDivisionError due to error_count being 0 in a success rate calculation.

AI-Assisted Triage and Fixes: The user prompts the coding assistant with the problem and provides the link to the "Airflow Cookbook" from the Knowledge Hub. The DataMate, utilizing its built-in memory and the provided context, identifies the ZeroDivisionError and suggests fixes based on the best practices outlined in the cookbook, such as proper error handling, input validation, numerical operations, structured logging, and exception handling. The agent then applies the recommended code changes.

Successful DAG Run and Memory Hub: After applying the fixes, the DAG runs successfully. Finally, the presentation highlights the Memory Hub, which stores a list of memories shared across DataMates, leading to enhanced contextual understanding, increased efficiency, and reduced repetition.