Re: Pinning dependencies for Apache Airflow

Hi all,
Have you considered looking into poetry[1]? I’ve had really good experiences with it, we specifically introduced it into our project because we were getting version conflicts, and it resolved them just fine. It properly supports semantic versioning, so package versions have upper bounds. It also has a full dependency resolver, so even when package upgrades are available, it will only upgrade if the version constraints allow it. It does have some issues though, most notably that it depends on package metadata being correct to properly resolve dependencies, and that’s not always the case.
Cheers,
Björn
[1]: https://poetry.eustace.io/
> On 5. Oct 2018, at 03:58, James Meickle <jmeickle@xxxxxxxxxxxxxx.INVALID> wrote:
>
> I suggest not adopting pipenv. It has a nice "first five minutes" demo but
> it's simply not baked enough to depend on as a swap in pip replacement. We
> are in the process of removing it after finding several serious bugs in our
> POC of it.
>
> On Thu, Oct 4, 2018, 20:30 Alex Guziel <alex.guziel@xxxxxxxxxx.invalid>
> wrote:
>
>> FWIW, there's some value in using virtualenv with Docker to isolate
>> yourself from your system's Python.
>>
>> It's worth noting that requirements files can link other requirements
>> files, so that would make groups easier, but not that pip in one run has no
>> guarantee of transitive dependencies not conflicting or overriding. You
>> need pip check for that or use --no-deps.
>>
>> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko <fokko@xxxxxxxxxxxxxx>
>> wrote:
>>
>>> Hi Jarek,
>>>
>>> Thanks for bringing this up. I missed the discussion on Slack since I'm
>> on
>>> holiday, but I saw the thread and it was way too interesting, and
>> therefore
>>> this email :)
>>>
>>> This is actually something that we need to address asap. Like you
>> mention,
>>> we saw it earlier that specific transient dependencies are not compatible
>>> and then we end up with a breaking CI, or even worse, a broken release.
>>> Earlier we had in the setup.py the fixed versions (==) and in a separate
>>> requirements.txt the requirements for the CI. This was also far from
>>> optimal since we had two versions of the requirements.
>>>
>>> I like the idea that you are proposing. Maybe we can do an experiment
>> with
>>> it, because of the nature of Airflow (orchestrating different systems),
>> we
>>> have a huge list of dependencies. To not install everything, we've
>> created
>>> groups. For example specific libraries when you're using the Google
>> Cloud,
>>> Elastic, Druid, etc. So I'm curious how it will work with the `
>>> extras_require` of Airflow
>>>
>>> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
>>> Docker is much easier to work with. I'm also working on a PR to get rid
>> of
>>> tox for the testing, and move to a more Docker idiomatic test pipeline.
>>> Curious what you thoughts are on that.
>>>
>>> Cheers, Fokko
>>>
>>> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
>>> arthur.wiedmer@xxxxxxxxx
>>>> :
>>>
>>>> Thanks Jakob!
>>>>
>>>> I think that this is a huge risk of Slack.
>>>> I am not against Slack as a support channel, but it is a slippery slope
>>> to
>>>> have more and more decisions/conversations happening there, contrary to
>>>> what we hope to achieve with the ASF.
>>>>
>>>> When we are starting to discuss issues of development, extensions and
>>>> improvements, it is important for the discussion to happen in the
>> mailing
>>>> list.
>>>>
>>>> Jarek, I wouldn't worry too much, we are still in the process of
>> learning
>>>> as a community. Welcome and thank you for your contribution!
>>>>
>>>> Best,
>>>> Arthur.
>>>>
>>>> On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Thanks for pointing it out Jakob.
>>>>>
>>>>> I am still very fresh in the ASF community and learning the ropes and
>>>>> etiquette and code of conduct. Apologies for my ignorance.
>>>>> I re-read the conduct and FAQ now again - with more understanding and
>>>> will
>>>>> pay more attention to wording in the future. As you mentioned it's
>> more
>>>> the
>>>>> wording than intentions, but since it was in TL;DR; it has stronger
>>>>> consequences.
>>>>>
>>>>> BTW. Thanks for actually following the code of conduct and pointing
>> it
>>>> out
>>>>> in respectful manner. I really appreciate it.
>>>>>
>>>>> J.
>>>>>
>>>>> Principal Software Engineer
>>>>> Phone: +48660796129
>>>>>
>>>>> On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jghoman@xxxxxxxxx> wrote:
>>>>>
>>>>>>> TL;DR; A change is coming in the way how
>> dependencies/requirements
>>>> are
>>>>>>> specified for Apache Airflow - they will be fixed rather than
>>>> flexible
>>>>>> (==
>>>>>>> rather than >=).
>>>>>>
>>>>>>> This is follow up after Slack discussion we had with Ash and
>> Kaxil
>>> -
>>>>>>> summarising what we propose we'll do.
>>>>>>
>>>>>> Hey all. It's great that we're moving this discussion back from
>>> Slack
>>>>>> to the mailing list. But I've gotta point out that the wording
>> needs
>>>>>> a small but critical fix up:
>>>>>>
>>>>>> "A change *is* coming... they *will* be fixed"
>>>>>>
>>>>>> needs to be
>>>>>>
>>>>>> "We'd like to propose a change... We would like to make them
>> fixed."
>>>>>>
>>>>>> The first says that this decision has been made and the result of
>> the
>>>>>> decision, which was made on Slack, is being reported back to the
>>>>>> mailing list. The second is more accurate to the rest of the
>>>>>> discussion ('what we propose...'). And again, since it's axiomatic
>>> in
>>>>>> ASF that if it didn't happen on a list, it didn't happen[1], we
>> gotta
>>>>>> make sure there's no confusion about where the community is on the
>>>>>> decision-making process.
>>>>>>
>>>>>> Thanks,
>>>>>> Jakob
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>
>>>>
>>>
>> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
>>>>>> ?
>>>>>
>>>>> On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
>>>>>> <alex.guziel@xxxxxxxxxx.invalid> wrote:
>>>>>>>
>>>>>>> You should run `pip check` to ensure no conflicts. Pip does not
>> do
>>>> this
>>>>>> on
>>>>>>> its own.
>>>>>>>
>>>>>>> On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <
>>>> Jarek.Potiuk@xxxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Great that this discussion already happened :). Lots of useful
>>>> things
>>>>>> in
>>>>>>>> it. And yes - it means pinning in requirement.txt - this is how
>>>>>> pip-tools
>>>>>>>> work.
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>> Principal Software Engineer
>>>>>>>> Phone: +48660796129
>>>>>>>>
>>>>>>>> On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <
>>>> arthur.wiedmer@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Jarek,
>>>>>>>>>
>>>>>>>>> I will +1 the discussion Dan is referring to and George's
>>> advice.
>>>>>>>>>
>>>>>>>>> I just want to double check we are talking about pinning in
>>>>>>>>> requirements.txt only.
>>>>>>>>>
>>>>>>>>> This offers the ability to
>>>>>>>>> pip install -r requirements.txt
>>>>>>>>> pip install --no-deps airflow
>>>>>>>>> For a guaranteed install which works.
>>>>>>>>>
>>>>>>>>> Several different requirement files can be provided for
>>> specific
>>>>> use
>>>>>>>> cases,
>>>>>>>>> like a stable dev one for instance for people wanting to work
>>> on
>>>>>>>> operators
>>>>>>>>> and non-core functions.
>>>>>>>>>
>>>>>>>>> However, I think we should proactively test in CI against
>>>> unpinned
>>>>>>>>> dependencies (though it might be a separate case in the
>>> matrix) ,
>>>>> so
>>>>>> that
>>>>>>>>> we get advance warning if possible that things will break.
>>>>>>>>> CI downtime is not a bad thing here, it actually caught a
>>> problem
>>>>> :)
>>>>>>>>>
>>>>>>>>> We should unpin as possible in setup.py to only maintain
>>> minimum
>>>>>> required
>>>>>>>>> compatibility. The process of pinning in setup.py is
>> extremely
>>>>>>>> detrimental
>>>>>>>>> when you have a large number of python libraries installed
>> with
>>>>>> different
>>>>>>>>> pinned versions.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Arthur
>>>>>>>>>
>>>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
>>>>>> <ddavydov@xxxxxxxxxxx.invalid
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Relevant discussion about this:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
>>>>>> Jarek.Potiuk@xxxxxxxxxxx
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> TL;DR; A change is coming in the way how
>>>>>> dependencies/requirements
>>>>>>>> are
>>>>>>>>>>> specified for Apache Airflow - they will be fixed rather
>>> than
>>>>>>>> flexible
>>>>>>>>>> (==
>>>>>>>>>>> rather than >=).
>>>>>>>>>>>
>>>>>>>>>>> This is follow up after Slack discussion we had with Ash
>>> and
>>>>>> Kaxil -
>>>>>>>>>>> summarising what we propose we'll do.
>>>>>>>>>>>
>>>>>>>>>>> *Problem:*
>>>>>>>>>>> During last few weeks we experienced quite a few
>> downtimes
>>> of
>>>>>>>> TravisCI
>>>>>>>>>>> builds (for all PRs/branches including master) as some of
>>> the
>>>>>>>>> transitive
>>>>>>>>>>> dependencies were automatically upgraded. This because
>> in a
>>>>>> number of
>>>>>>>>>>> dependencies we have >= rather than == dependencies.
>>>>>>>>>>>
>>>>>>>>>>> Whenever there is a new release of such dependency, it
>>> might
>>>>>> cause
>>>>>>>>> chain
>>>>>>>>>>> reaction with upgrade of transitive dependencies which
>>> might
>>>>> get
>>>>>> into
>>>>>>>>>>> conflict.
>>>>>>>>>>>
>>>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
>>>>>> dependency
>>>>>>>>> with
>>>>>>>>>>> click. They started to conflict once AppBuilder has
>>> released
>>>>>> version
>>>>>>>>>>> 1.12.0.
>>>>>>>>>>>
>>>>>>>>>>> *Diagnosis:*
>>>>>>>>>>> Transitive dependencies with "flexible" versions (where
>>> =
>>> is
>>>>>> used
>>>>>>>>>> instead
>>>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner
>> or
>>>>>> later hit
>>>>>>>>>> other
>>>>>>>>>>> cases where not fixed dependencies cause similar problems
>>>> with
>>>>>> other
>>>>>>>>>>> transitive dependencies. We need to fix-pin them. This
>>> causes
>>>>>>>> problems
>>>>>>>>>> for
>>>>>>>>>>> both - released versions (cause they stop to work!) and
>> for
>>>>>>>> development
>>>>>>>>>>> (cause they break master builds in TravisCI and prevent
>>>> people
>>>>>> from
>>>>>>>>>>> installing development environment from the scratch.
>>>>>>>>>>>
>>>>>>>>>>> *Solution:*
>>>>>>>>>>>
>>>>>>>>>>> - Following the old-but-good post
>>>>>>>>>>> https://nvie.com/posts/pin-your-packages/ we are
>> going
>>> to
>>>>>> fix the
>>>>>>>>>>> pinned
>>>>>>>>>>> dependencies to specific versions (so basically all
>>>>>> dependencies
>>>>>>>> are
>>>>>>>>>>> "fixed").
>>>>>>>>>>> - We will introduce mechanism to be able to upgrade
>>>>>> dependencies
>>>>>>>>> with
>>>>>>>>>>> pip-tools (https://github.com/jazzband/pip-tools). We
>>>> might
>>>>>> also
>>>>>>>>>> take a
>>>>>>>>>>> look at pipenv:
>>> https://pipenv.readthedocs.io/en/latest/
>>>>>>>>>>> - People who would like to upgrade some dependencies
>> for
>>>>>> their PRs
>>>>>>>>>> will
>>>>>>>>>>> still be able to do it - but such upgrades will be in
>>>> their
>>>>> PR
>>>>>>>> thus
>>>>>>>>>> they
>>>>>>>>>>> will go through TravisCI tests and they will also have
>>> to
>>>> be
>>>>>>>>> specified
>>>>>>>>>>> with
>>>>>>>>>>> pinned fixed versions (==). This should be part of
>>> review
>>>>>> process
>>>>>>>> to
>>>>>>>>>>> make
>>>>>>>>>>> sure new/changed requirements are pinned.
>>>>>>>>>>> - In release process there will be a point where an
>>>> upgrade
>>>>>> will
>>>>>>>> be
>>>>>>>>>>> attempted for all requirements (using pip-tools) so
>> that
>>>> we
>>>>>> are
>>>>>>>> not
>>>>>>>>>>> stuck
>>>>>>>>>>> with older releases. This will be in controlled PR
>>>>> environment
>>>>>>>> where
>>>>>>>>>>> there
>>>>>>>>>>> will be time to fix all dependencies without impacting
>>>>> others
>>>>>> and
>>>>>>>>>> likely
>>>>>>>>>>> enough time to "vet" such changes (this can be done
>> for
>>>>>> alpha/beta
>>>>>>>>>>> releases
>>>>>>>>>>> for example).
>>>>>>>>>>> - As a side effect dependencies specification will
>>> become
>>>>> far
>>>>>>>>> simpler
>>>>>>>>>>> and straightforward.
>>>>>>>>>>>
>>>>>>>>>>> Happy to hear community comments to the proposal. I am
>>> happy
>>>> to
>>>>>> take
>>>>>>>> a
>>>>>>>>>> lead
>>>>>>>>>>> on that, open JIRA issue and implement if this is
>> something
>>>>>> community
>>>>>>>>> is
>>>>>>>>>>> happy with.
>>>>>>>>>>>
>>>>>>>>>>> J.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
>>>>>>>>>>> Mobile: +48 660 796 129
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>