Apache Beam


大数据平台? Apache Beam is a library for parallel data processing.


Pipeline A pipeline is a sequence of data transformations

PCollection 可并发的无序集合, PCollection elements might live in multiple worker machines.

PTransform 处理函数


directrunner: A PipelineRunner that executes a Pipeline within the process that constructed the Pipeline. 用于运行小数据量进行测试联调。

DataflowRunner: 提交到GCP

Written by Lu.dev on 15 September 2021