GitHub user kennknowles opened a pull request:
https://github.com/apache/beam/pull/4010
[BEAM-3074] Stage the pipeline in Python DataflowRunner
Follow this checklist to help us incorporate your contribution quickly and easily:
- [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/)
filed for the change (usually before you start working on it). Trivial changes like typos
do not require a JIRA issue. Your pull request should address just this issue, without pulling
in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`,
where you replace `BEAM-XXX` with the appropriate JIRA issue.
- [ ] Write a pull request description that is detailed enough to understand what the
pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will
be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.pdf).
---
R: @aaltay (or redirect to other appropriate Python reviewer?)
Just hacked this out naively; it probably isn't respecting abstractions quite right. I
confirmed enough that the file is staged - much simpler than Java :+1:. Also no tests :1st_place_medal:.
In doing a manual smoke test, I just tried to follow some combination of the quickstart
plus the contribution guide, but broke during staging because `pip install --download` doesn't
like that I did `pip install -e .[gcp]`. Is there a doc that has the steps for a new contributor
to run wordcount with local modifications? I'm a bit rusty on the approved way of setting
up the virtualenv. The crash occurs after the pipeline is staged, so I was able to check the
basics anyhow.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kennknowles/beam py-stage-pipeline
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/4010.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4010
----
commit 2f5293373280ebcea618a6aa4fa1057237441512
Author: Kenneth Knowles <kenn@apache.org>
Date: 2017-10-18T20:56:28Z
Stage the pipeline in Python DataflowRunner
----
---