beam-commits mailing list archives

[jira] [Commented] (BEAM-542) Spark batch interval should be a configuration instead of an interpretation of the Pipeline's windows

Date

Wed, 10 Aug 2016 10:36:22 GMT

[ https://issues.apache.org/jira/browse/BEAM-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415091#comment-15415091
]
ASF GitHub Bot commented on BEAM-542:
-------------------------------------
GitHub user amitsela opened a pull request:
https://github.com/apache/incubator-beam/pull/808
[BEAM-542] Spark batch interval should be a configuration instead of an interpretation
of the Pipeline's windows
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt).
---
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/amitsela/incubator-beam BEAM-542
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/808.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #808
----
commit f0201f7f1426d1b5d7058aff02202a9cb54b3ac0
Author: Sela <ansela@paypal.com>
Date: 2016-08-10T10:30:30Z
Add the batch interval to the pipeline options, default arbitrarily to 1000 msec.
commit d0eab7b8a4f179d1c2beefe471a983c26c75ce86
Author: Sela <ansela@paypal.com>
Date: 2016-08-10T10:32:14Z
Pick-up the batch interval from pipeline options and remove StreamingWindowPipelineDetector.
commit 8c6a7b5ca04b61af2f6b8acef695f9ffe1aa32a0
Author: Sela <ansela@paypal.com>
Date: 2016-08-10T10:32:59Z
Use SDK API to get the window function.
commit 3cf66d7d25829548436abbb56b0699612a534768
Author: Sela <ansela@paypal.com>
Date: 2016-08-10T10:33:25Z
Update the README
commit c22232e333aa86b5ac97d25ac7d6a2b83f699f34
Author: Sela <ansela@paypal.com>
Date: 2016-08-10T10:33:34Z
Update streaming tests
----
> Spark batch interval should be a configuration instead of an interpretation of the Pipeline's
windows
> -----------------------------------------------------------------------------------------------------
>
> Key: BEAM-542
> URL: https://issues.apache.org/jira/browse/BEAM-542
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Amit Sela
> Assignee: Amit Sela
>
> Currently, the SparkRunner extracts the batch interval from the duration of the first
window.
> This is wrong in several ways:
> # GlobalWindow pipelines
> # It's an engine specific property and should not be expressed as a part of the logic
but rather as a configuration for execution of the pipeline.
> # Effectively forces the definition of Fixed/SlidingWindows even when they are not needed
(stateless processing), which also makes the pipeline code not portable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)