Just that you can write it in the correct way makes the concept not less error prone: It does not follow the principle of least surprise.

Using variables within variables (what you suggest) is exactly what the *DEPEND variables are better suited for, and what I had suggested to use instead - the idea of DEPENDENCIES in contrast was obviously to avoid this.

your "cleaner" solution will fail once there are more conditional dependencies, like

As for eclasses, you're missing that the package manager merges the dependencies - they're not simply concatenated.

This is false. There are only two types of variables: incremental and non-incremental. Non-incremental variables are overridden while incremental variables (*DEPEND, IUSE, and REQUIRED_USE) are concatenated: This is specified in pms, Section 10.2. Of course, you can require that DEPENDENCIES be a third type of variable for which a "magic merge" occurs. But if a concept needs such exceptional confusing treatment from the very beginning, just for being able to do halfway what it is supposed to do, this is a rather broken concept.

As for dep definition and parsing, go to hell with XML, way to much noise and needs either a dep on itself or a lot of bash magic.

Not only this: XML needs to be a fixed file. No possibility to use just a slightly modified version "if [[ $PV = 9999 ]]]" or something similar: You always have to copy and edit the file, and if changes occur you have to edit several files.

With one simple modification, it won't: require the DEPENDENCIES variable start with a label. I understand the reason for wanting it to default without a label to the most common dependency type but, as you've pointed out, the PMS concatenation mechanism makes that error prone.

- John_________________I can confirm that I have received between 0 and 999 National Security Letters.

With one simple modification, it won't: require the DEPENDENCIES variable start with a label.

Not only start with a label, but whenever you add something you must start with a label. And this is what makes it behave like *DEPEND, where you specify the label in the variable name, but with the difference that you have to retype the label. So it makes things not simpler but more complicated and more error-prune by requiring redundant typing "by policy": the package manager cannot verify that you have not forgotten to type the label when appending to a variable, since the package manager just runs the bash code and does not analyze it. And finally, you end up with lots of redundant information in the metadata which also does not sound that this is the right[TM] concept.

I'd say stick with the short names (RDEPEND, HDEPEND and so on). For one, they don't break anything. For two, they're pretty simple (Well sometimes they can be overly complex (see portage's python dependency handling), but not usually and no other solutions would solve that) which seems important. ebuilds are great because they're VERY powerful, but quite simple, which is important.

This is what I think as well. Simple, short(ish) names is what we have been successfully using for over a decade. I really see no need to change that. Especially not to something overly verbose and complicated — leave that to Exherbo which excels in that._________________"Those who deny freedom to others deserve it not for themselves." - Abraham Lincoln
Free Culture | Defective by Design | EFF

As for dep definition and parsing, go to hell with XML, way to much noise and needs either a dep on itself or a lot of bash magic.

Not only this: XML needs to be a fixed file. No possibility to use just a slightly modified version "if [[ $PV = 9999 ]]]" or something similar: You always have to copy and edit the file, and if changes occur you have to edit several files.

Code:

<depend ifversion="9999">
...
</depend>

Even if XML doesn't get a green light, I still think it's a better strategy to use a ready parser and base the DEPENDENCIES syntax on YAML or JSON or anything similar (I'd go with YAML, because it's a lot more readable IMHO), because it would save the devs so much time and nerves as they won't have to deal with lots of new bugs in their own parser, not to mention that a ready parser is a likely to be faster than stuff written in bash or awk and any error reporting would be on a higher level.

E.g. with YAML, they could just get the DEPENDENCIES variable from the ebuild through shlex and then let a PyYaml (with binding on a C library) do the parsing and then provide portage with ready pythonic structured data.

Some may say that a stand-alone parser just adds another dependency, but if you make your own parser, that gives you more code you have to support and fix on your own, so you "depend" on a bunch of additional code anyway. And extending it would be require more work than just standardizing a few more mappings in a general data serialization language.

You can support one particular type of tests (or maybe several), but you do not have the flexibility a programming language offers. Essentially, it is replacing the flexibility of a programming language by a static setup. Even the authors of *kit eventually realized that this is stupid (although they have drawn the false consequences for a system's program, but this is a different story).

Quote:

Some may say that a stand-alone parser just adds another dependency, but if you make your own parser, that gives you more code you have to support and fix on your own, so you "depend" on a bunch of additional code anyway.

Or you just avoid such an artificially complex specification ("artificial" because it makes only things complex without giving the user any substantially new possibilities - rather the opposite) and use just the trivial parser which is needed for the current *DEPEND variables.

With one simple modification, it won't: require the DEPENDENCIES variable start with a label.

Not only start with a label, but whenever you add something you must start with a label. ...

Not really. Etal provided an elegant solution to that objection. However, I get it. Old variable style is simpler; adding new /.DEPEND/ variables is simpler; the prettier syntax (if you think it's prettier, that is) may not be worth the implementation and conversion cost.

- John_________________I can confirm that I have received between 0 and 999 National Security Letters.

Incidentally, mgorny doesn't think the current parser is trivial. One of his objections to the DEPENDENCIES proposal is that the current parser is "spooky complex" already. Which gives me a wonderful idea for simplifying the parser, perhaps bringing it down to the "trivial" complexity that will have so many benefits. USE flag conditional dependency constructs like

Code:

foo? (app-foo/foo)

could so easily be replaced with simple Bash constructs like

Code:

use foo && DEPEND+=" app-foo/foo"

that the extra complexity in the parser hardly seems warranted. Thoughts?

- John_________________I can confirm that I have received between 0 and 999 National Security Letters.

That's nice, but I see me forgetting this additional (white-)space often. A simple add_to_depends() function taking care of this wouldn't be much overhead and easier to handle, imho._________________++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.

This is not possible, because it happens at the wrong time: The metadata must be independent of your current useflags. Otherwise, portage could not use metadata for dependency resolving but would have to execute all ebuilds for every emerge. The time would be unbearable.
It is, as I said: The current *DEPEND is probably among the simplest syntax in which the required information can be stored in text form.

You can support one particular type of tests (or maybe several), but you do not have the flexibility a programming language offers. Essentially, it is replacing the flexibility of a programming language by a static setup. Even the authors of *kit eventually realized that this is stupid (although they have drawn the false consequences for a system's program, but this is a different story).

…but you just said static setup is required anyway:

mv wrote:

This is not possible, because it happens at the wrong time: The metadata must be independent of your current useflags. Otherwise, portage could not use metadata for dependency resolving but would have to execute all ebuilds for every emerge. The time would be unbearable.

John R. Graham wrote:

Incidentally, mgorny doesn't think the current parser is trivial. One of his objections to the DEPENDENCIES proposal is that the current parser is "spooky complex" already. Which gives me a wonderful idea for simplifying the parser, perhaps bringing it down to the "trivial" complexity that will have so many benefits.

How 'bout asking what exactly makes the parser complex before thinking about ways to optimize it?

also, someone mentioned earlier that the metadata have to be readable by other third-party tools (e.g. eix). A static setup (in regards to what Dr.Willy was talking about) in some language with accessible parsers would make a lot of things easier for everyone. Ya no, just sayi'n...

Not static, only independent of system settings/features (like USE, ARCH, FEATURES, ACCEPT_KEYWORDS, ...): The metadata is supposed to be downloaded by all users.

Then in what way are they dynamic?

They can depend on anything else: EAPI, eclasses used directly or implicitly, PROPERTIES, RESTRICT, SRC_URI, P, SLOT (some of these are sometimes only calculated in eclasses and thus not explicitly known to the ebuild in advance), everything stored in files/, ...

I agree with mv that the current format of a dep-string is as simple as it can be, and it should not be made more complex. He's also shown quite convincingly how a single variable leads to difficulties with eclasses which can only be resolved by typing yet more information into an already complex format.

Merging all the variables requires a more complex parser, and is not as transparent (if something's hard to parse for a machine, it's usually hard for a tired human-being too.) Furthermore, it makes it harder to have different handling of each dependency-type.

For example, if we were to add lib-dependencies, we might want to add the lib name/s that is/are linked to (though I personally don't think it's warranted, nor a good idea given the scanning of binaries that already occurs. That doesn't stop the link-dependency information being useful, since its omission accounts for a lot of the current head-scratching about "ABI" sub-slot operators which were designed for Java and Python plugins, to which they are suited, not link-time deps to which they are not, but for which they have been bastardised and presented as The Solution(tm). Too much "I know, we could also use this for.." to prove how clever a new idea is, and not enough "Do we really want to though?" afaic.)

In another case, suggested and recommended dependencies could go in one variable with a label, since they are not used as part of normal dependency calculation at all, but are there for the user-interface to optionally present after the normal calculation has occurred. It would also make sense to allow for future types of UI suggestion: confining them all to one variable makes sense, since they're outside the scope of PMS.

So, given that a single DEPENDENCIES variable adds complexity to routine calculations, while making ebuilds harder to work with, and restricts future possibilities, I really can't see the point. The question then is: should we switch to a different naming scheme for existing variables? I can see no benefit to it at this late stage.

If we were discussing a new format, we might prefer slightly more verbose names in line with the original BSD ports (see link above.) But it's far too late for that: why require such a massive shift in the way people are used to working?

WRT newbs learning to write ebuilds, my experience of it wasn't that the variable names were a problem; getting the damn thing to build at all was ;) and that was all about the package and its build-system, and tweaking the commands given to it.

The *DEPEND system has one major problem: The "type" classification is one-dimensional, but to accurately model the desired system you need a multi-dimensional classification (wether the desired fine-grained dependency system is a good or bad thing is another discussion).
Some dimensions that come to mind:
- ebuild phase (RDEPEND for preinst, DEPEND for compile, PDEPEND for postinst, FDEPEND for fetch, ...)
- install target (HDEPEND for host, TDEPEND for target, ...)
- conditionals (use? bla, if $PV == 9999 then FDEPEND += vcs, ...)
- for future possibly subpackage selection (DEPEND_COMMON, DEPEND_SERVER, DEPEND_CLIENT)
So I think the DEPENDENCIES approach is generally the right direction for the desired system, though the proposed syntax is awkward in my eyes. Probably because it still follows the same pattern of situation first, subject second, includes optional syntax elements and mixes different dimensions into a single namespace. Maybe would like it better if you reverse it, e.g.

Basically for each dependency explicitly state under which conditions it will be needed, so it is basically standalone and not dependant on context. Also the problem "X is needed in these situations" to me seems more common than "in this situation I need X,Y,Z, in that situation I need A,X,Z".
Makes things more verbose for sure (so more typing required), but I'd like that better than having to backtrack possibly multiple nested layers of labels or cascaded variable assignments (RDEPEND+=COMMONDEPEND+=LIBDEPEND+=COMMONLIBDEPEND) when resolving "why-is-this-dep-installed-there" issues.
Mind that I've just made up this syntax, it's nowhere near a fleshed out proposal, so don't bother pointing out technical/conceptual issues, there will be lots I'm sure.

Long story short: generic solution > multiple specialized solutions.

Don't get me wrong, I pretty much like the current simple variables. However they won't scale into the desired fine-grained system. And the system is simple because the number of variables is very limited, and semantic differences are minimal. But as said: Wether that system is worth the effort, both technical and mental (more important), is a completely different discussion.

The *DEPEND system has one major problem: The "type" classification is one-dimensional, but to accurately model the desired system you need a multi-dimensional classification (wether the desired fine-grained dependency system is a good or bad thing is another discussion).
Some dimensions that come to mind:
- ebuild phase (RDEPEND for preinst, DEPEND for compile, PDEPEND for postinst, FDEPEND for fetch, ...)
- install target (HDEPEND for host, TDEPEND for target, ...)

God, everytime I see those misnamed variables.. CBUILD, CHOST (aka 'target') CTARGET (only relevant for a toolchain.) That, like it or not, is how things are named in the cross-compilation world. Confusion arises because for an embedded developer, the target board is where things will run, and be built on the host, so they see the host system as the build system (and there is no distinction between them.)
From the point of view of the compilation system, the host is what we are building for, in order to compile things for the target.

At the moment, DEPEND = build-machine dependency. This will be the "new" HDEPEND.

CHOST DEPEND (ie must be installed in ROOT during build, a new category) is what DEPEND will be under proposed new EAPI.

So the new EAPI will fundamentally change what DEPEND means, instead of adding the new category to a consistently-named new HDEPEND.

The reason for this is that the google-chrome people have been using DEPEND to mean Host Dependency, and HDEPEND to mean Build Dependency.

Am I the only one who thinks that's crazy? You're switching the meaning of a fundamental variable, in a manner that is inconsistent with its historical meaning, the rest of the toolchain, and the rest of the world, so volunteer developers will have to constantly grapple with that cognitive dissonance (and the change in all their ebuilds since DEPEND no longer means build-machine dependency) in order to fulfil the needs of a commercial operation that would in all likelihood simply switch to the consistent variable, if that's the path you chose. After all, it would make sense, and they're paid to maintain it.

Quote:

- conditionals (use? bla, if $PV == 9999 then FDEPEND += vcs, ...)
- for future possibly subpackage selection (DEPEND_COMMON, DEPEND_SERVER, DEPEND_CLIENT)
So I think the DEPENDENCIES approach is generally the right direction for the desired system, though the proposed syntax is awkward in my eyes.

I think ferringb made a pretty decent case on the mailing-list for why a single DEPENDENCIES variables is more cache-friendly. He also showed how the package manager can infer that single variable from what it finds in the ebuild/eclasses, meaning that there is no reason the PM should not use such an internal format if it wants (or the three groups decide it is a better format across the board, for the cache.) So technically speaking, ebuild format does not need to change at all.

Quote:

Probably because it still follows the same pattern of situation first, subject second, includes optional syntax elements and mixes different dimensions into a single namespace. Maybe would like it better if you reverse it, e.g.

Basically for each dependency explicitly state under which conditions it will be needed, so it is basically standalone and not dependant on context. Also the problem "X is needed in these situations" to me seems more common than "in this situation I need X,Y,Z, in that situation I need A,X,Z".

Well I like it better, as it shows the package up-front. That's nothing more than a preference though: for instance I can understand the argument for USE flags first, since that shows what will change when I add or remove one.

Quote:

Makes things more verbose for sure (so more typing required), but I'd like that better than having to backtrack possibly multiple nested layers of labels or cascaded variable assignments (RDEPEND+=COMMONDEPEND+=LIBDEPEND+=COMMONLIBDEPEND) when resolving "why-is-this-dep-installed-there" issues.

It's funny how people mention LDEPEND or LIBDEPEND whenever they want to pull in a new variable to argue with. Yet if you look at it, most of the times there is a COMMONDEPEND, it is precisely because the package provides a library that is linked to. No-one seems to be giving any thought to whether a library or link dependency is actually a pretty fundamental type of dependency.

Quote:

Mind that I've just made up this syntax, it's nowhere near a fleshed out proposal, so don't bother pointing out technical/conceptual issues, there will be lots I'm sure.

Long story short: generic solution > multiple specialized solutions.

Don't get me wrong, I pretty much like the current simple variables.

You've said that twice now :)
Personally I think the proposed new syntax, while technically feasible, is an obfuscated mess. It has the distinct disadvantage of being complex to type and to read, when dependency strings are already tricky enough, as others have pointed out.

Quote:

However they won't scale into the desired fine-grained system. And the system is simple because the number of variables is very limited, and semantic differences are minimal. But as said: Wether that system is worth the effort, both technical and mental (more important), is a completely different discussion.

Yet that is precisely the discussion which needs to take place: is it worth changing the basics of how ebuilds are written? What do we gain, especially when the technical benefits can be realised in the back-end without even changing format?

Please, do me a favour, and take a think about how things would look if we had LDEPEND from the beginning, ignoring all the other proposed new variables (I think that's reasonable, since LIB_DEPENDS has always been in ports.)

For instance, don't you think we'd have been scanning to check linkage didn't go outside the named set, and system? (After all, we scan binaries for QA stuff.) What impact would that have had on usage of revdep-rebuild and implementing --preserved-libs?

Would we be having the current discussion about sub-slot operators, or would that have simply been a non-controversial feature for java and python plugins, agreed on and implemented a year ago?

This sound so simple (and is simple) because it looses a lot of functionality, most important: The implicit changing of branches due to installed dependencies. For example,

Code:

DEPEND="a || ( b c )"

cannot properly be reflected. Not to speak about more complex cases when there is a different bracing needed for HDEPEND than for DEPEND. Also, your proposal does not cover easily the case if e.g. RDEPEND needs vcs[some useflag].

Concerning your remark about cache-friendlyness, I have serious doubts whether the duplication in cache is as large as the overhead of the new syntax. One should write an automatic converter to check how large it really is in the tree.