beam-commits mailing list archives

[jira] [Commented] (BEAM-1983) SDF should properly support windowed side inputs

Date

Sat, 15 Apr 2017 16:41:41 GMT

[ https://issues.apache.org/jira/browse/BEAM-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970016#comment-15970016
]
Eugene Kirpichov commented on BEAM-1983:
----------------------------------------
Yeah, I understand that much better now. I suppose we can remove that part after pre-exploding
windows. Another advantage is that it'll make us perform just 1 ProcessElement call per KeyedWorkItem,
which seems a lot easier to reason about, given that each call produces a residual restriction.
> SDF should properly support windowed side inputs
> ------------------------------------------------
>
> Key: BEAM-1983
> URL: https://issues.apache.org/jira/browse/BEAM-1983
> Project: Beam
> Issue Type: Bug
> Components: runner-apex, runner-dataflow, runner-direct, runner-flink, sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Eugene Kirpichov
>
> Currently there is no test coverage for Splittable DoFn + windowed side inputs, especially
when not all of the side input windows are ready.
> Moreover, current implementation of SDF in the direct runner is definitely wrong: it
uses a ParDoEvaluator to run the ProcessFn, and this ParDoEvaluator looks at the wrong windows
to decide which windows are ready and which are not: https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoEvaluator.java#L134
- the WindowedValue in question is a KeyedWorkItem, and they are always in the global window,
but the important windows are windows of elements inside this KWI's elementsIterable().
> The Flink implementation is also wrong in the same way.
> This JIRA is to:
> 1) add test coverage for this case
> 2) implement proper support in all runners
> I believe the easiest way to do 2) is to:
> - make SplittableParDo, in case the DoFn has side inputs, pre-explode windows before
feeding them into GroupByKeyIntoKeyedWorkItems , so that the resulting KWI's have elements
only in a single window
> - tweak runners to look at the proper window, and assert that there's only one window,
while evaluating ProcessFn, in case the DoFn uses side inputs
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)