Using upstream Apache Airflow Hooks and Operators in Cloud Composer

栏目: PHP · 发布时间: 6年前

Using upstream Apache Airflow Hooks and Operators in Cloud Composer

Using upstream Apache Airflow Hooks and Operators in Cloud Composer

admin GoogleCloud No comments

Source: Using upstream Apache Airflow Hooks and Operators in Cloud Composer from Google Cloud

For engineers or developers in charge of integrating, transforming, and loading a variety of data from an ever-growing collection of sources and systems, Cloud Composer has dramatically reduced the number of cycles spent on workflow logistics. Built on Apache Airflow , Cloud Composer makes it easy to author, schedule, and monitor data pipelines across multiple clouds and on-premises data centers.

Let’s walk through an example of how Cloud Composer makes building a pipeline across public clouds easier. As you design your new workflow that’s going to bring data from another cloud (Microsoft Azure’s ADLS, for example) into Google Cloud, you notice that upstream Apache Airflow already has an ADLS hook that you can use to copy data. You insert an import statement into your DAG file, save, and attempt to test your workflow. “ImportError – no module named x.” Now what?

As it turns out, functionality that has been committed upstream—such as brand new Hooks and Operators —might not have made its way into Cloud Composer just yet. Don’t worry, though: you can still use these upstream additions by leveraging the Apache Airflow Plugin interface.

Using the upstream AzureDataLakeHook as an example, all you have to do is the following:

  1. Copy the code into a separate file (ensuring adherence to the Apache License)

  2. Import the AirflowPlugin module ( from airflow.plugins_manager import AirflowPlugin )

  3. Add the below snippet to the bottom of the file:

Once you have completed the above steps, you need to ensure that all other dependencies required by the functionality you added are included in your Cloud Composer environment. In this example we need to include the azure-datalake-store package. To install this package into your environment, you can use the Cloud Console. Navigate to Cloud Composer, click on your environment, followed by PyPI Packages, and then click “Edit.” It may take a few moments for the operation to complete, but once it succeeds, you should see a view similar to the screenshot below:

Using upstream Apache Airflow Hooks and Operators in Cloud Composer

Next, we need to make the plugin available to the Cloud Composer environment. To do this, you can copy the plugin to the plugins folder following the instructions here . This command will look something like this:

Once the plugin has been imported, you can now use it. This simple example snippet shows how to import the plugin and leverage the AzureDataLakeHook functionality that the plugin now provides in conjunction with the GoogleCloudStorageHook to copy data from ADLS to Cloud Storage:

You could easily extend this to create a more robust Operator that provides this functionality, and use the same workflow to make that available to your specific workflows.

In summary, you can use features from the upstream Apache Airflow codebase, including newer connectors to external data sources, even with Cloud Composer, Google’s managed Airflow service. For more on working with upstream components, check out the Airflow documentation here .

除非特别声明,此文章内容采用 知识共享署名 3.0 许可,代码示例采用 Apache 2.0 许可。更多细节请查看我们的 服务条款

Tags: Cloud


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

世界因你不同

世界因你不同

李开复、范海涛 / 中信出版社 / 2009-9 / 29.80元

这是李开复唯一的一本自传,字里行间,是岁月流逝中沉淀下来的宝贵的人生智慧和职场经验。捣蛋的“小皇帝”,11岁的“留学生”,奥巴马的大学同学,26岁的副教授,33岁的苹果副总裁,谷歌中国的创始人,他有着太多传奇的经历,为了他,两家最大的IT公司对簿公堂。而他的每一次人生选择,都是一次成功的自我超越。 透过这本自传,李开复真诚讲述了他鲜为人知的成长史、风雨兼程的成功史和烛照人生的心灵史,也首次全......一起来看看 《世界因你不同》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

URL 编码/解码
URL 编码/解码

URL 编码/解码

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具