Since our mailing list seems to hit less and less people interested in development of the ebuild format and PMS, I would like to try getting some feedback on the most debatable things here. I'm mostly wondering what our users think, not only the developers and most active participants of the mailing lists, since this is a thing which applies not only to Gentoo itself but to all distributions deriving from it and all people running their own ebuild repositories or writing ebuilds for their own use.

For some time, we are discussing introducing new dependency types to handle various issues which arose over time. There is a few proposals, and we have put a quick list of them on the Wiki article New dependency types. Feel free to take a look at it but please don't discuss it in this topic. It's a thing which will be discussed later, after we decide on more important things.

And that important thing is whether we like to keep the dependency variables in ebuilds like they are now. In other words, if we really want to introduce more variables like FDEPEND, HDEPEND, IDEPEND... because those one-letter-names are confusing to some people.

There are currently two alternatives to this. The shorter one is to simply use more verbose variable names -- either just for the new types, or for all of them. In other words, FETCH_DEPEND instead of FDEPEND, RUN_DEPEND instead of RDEPEND and so on.

The other one is to adopt a single DEPENDENCIES variable with magic syntax to specify which dependencies apply to what type. Since this is a long topic needing an example and a bit of description, and there were already a few concerns raised on it, instead of repeating it all here I'd like to point you out to the wiki article DEPENDENCIES variable where I tried to sum up what was told already, and the most recent, relevant thread at gmane.

As a note, the DEPENDENCIES variable would apply only to ebuilds using a newer EAPI, EAPI=5 or =6 most likely. Existing ebuilds would not need to be modified until being migrated to the new EAPI.

What I would like to get from you is feedback. What do you think about these ideas? Which do you consider best for you and for Gentoo? What issues do you see?

If you have some specific constructive feedback, we'd really appreciate it. If you don't, no problem -- you can still use the poll to choose your preferred option ;). Thanks in advance.

I think the short variables are more consistent with the philosophy of very short programs written in a specialty language who's base syntax is intentionally terse. The relatively trivial amount of extra domain knowledge to handle the short names is, well, trivial.

- John_________________I can confirm that I have received between 0 and 999 National Security Letters.

I think the short variables are more consistent with the philosophy of very short programs written in a specialty language who's base syntax is intentionally terse. The relatively trivial amount of extra domain knowledge to handle the short names is, well, trivial.

Shouldn't they be called "RUNDEP, BUILDDEP, …" or even "RDEP, BDEP, …" then?

In other words, if we really want to introduce more variables like FDEPEND, HDEPEND, IDEPEND... because those one-letter-names are confusing to some people.

This is not an argument. One can always find people confused by whatever conventional code.
A language convention is a convention.
If there is a rationale behind the convention and you can understand it, fair enough, if you don't, learn the convention by heart ! Period.

Well, as you understand, there are numerous examples in which, in order to obtain the widest possible agreement on a symbol, you end with... an even more confusing one. Confusing because : Meaningless.

I believe that as long as they are clearly described in the docs and don't stretch over several lines, their names can be more or less anything. Of course, it's better when the var names are easy to remember and I find *DEPEND quite intuitive. Also, stuff always becomes intuitive after long-term use, regardless of how confusing it was in the beginning.

for the alternatives:
AFAIK the main problem with exheres-type DEPENDS is the parser. So, here's my strain of thought: We need a language with a good (and hopefully fast) parser that supports hierarchies and can modify branch meanings based on USE flags and other things. Sounds a lot like XML ? So, let's have a DEPEND XML file along with the ebuild with a dependency hierarchy in it and possibly could use attributes for modifying the meaning of dependencies. Python has good XML parsers, so portage could parse it on its own, probably faster than parsing BASH strings. More importantly, XML would be less confusing IMO.

Ah, please avoid FUD. I think I made it clear that existing ebuilds wouldn't be broken. It would be the ebuilds being transformed to new EAPI which would require changes (i.e. having EAPI line changed).

Well, to be honest, some modification will usually be required anyway. If we decide to add host dependencies, ebuilds switching to new EAPI will probably have to move some of the DEPEND to HDEPEND. However, failing to do so wouldn't break stuff for the 'common' Gentoo installs.

for the alternatives:
AFAIK the main problem with exheres-type DEPENDS is the parser. So, here's my strain of thought: We need a language with a good (and hopefully fast) parser that supports hierarchies and can modify branch meanings based on USE flags and other things. Sounds a lot like XML ? So, let's have a DEPEND XML file along with the ebuild with a dependency hierarchy in it and possibly could use attributes for modifying the meaning of dependencies. Python has good XML parsers, so portage could parse it on its own, probably faster than parsing BASH strings. More importantly, XML would be less confusing IMO.

First of all, please note that every 'universal' language or parser will always be slower than a dedicated solution. Well, as long as we're assuming we are able to implement the parser properly. The dependency syntax can be tokenized quite easily.

IIRC, the *DEPEND strings are parsed in BASH. I believe an XML parser (expat, libxml2) with Python bindings written in C/C++ would outperform it. That was what I was onto.
Even if it was slower, it is unlikely that the XML parser would fail on a validated XML file. With complicated DEPEND strings you may not be able ensure that.
My main point is, why reinvent the wheel and invent another language with many initial bugs, if you can use a time-proven standard with many fast parsers?

That code black was purely illustrative, I'm not sure it would arise.
Also, that was a good idea with <pkg>, because <pkg category=""> syntax would ensure, that you don't have to parse the string after that. OTOH, you don't need to know the category separately during buildtime, but maybe it could be useful some other time. What I meant to say is that you can let the XML parser separate the stuff, you don't have design your own parser for everything. Of course, in this case str.split('/') is most likely faster.

I'd say stick with the short names (RDEPEND, HDEPEND and so on). For one, they don't break anything. For two, they're pretty simple (Well sometimes they can be overly complex (see portage's python dependency handling), but not usually and no other solutions would solve that) which seems important. ebuilds are great because they're VERY powerful, but quite simple, which is important.

Also, the short names are kind of nice because it's one thing that makes you read a BIT of documentation before you start hacking on a ebuild and have no idea what you're doing.

My only wish is that the name of DEPEND would change because with all these potential new dependency types, DEPEND has become less and less clear.

I always have thought that RDEPEND stands for REVERSE_DEPEND
I think that long names, like RUN_DEPEND, are better for people like me (not familiar with programming and scripting or ebuild EAPIs )
_________________"Dear Enemy: may the Lord hate you and all your kind, may you be turned orange in hue, and may your head fall off at an awkward moment."
"Linux is like a wigwam - no windows, no gates, apache inside..."

I think you should use long version name, because more explicit names mean easier learning, and if you are querying users, you might already have the idea to ease things for "common" users and opened up the ebuild to the mass. And as C vs assembly proof already, the more human readable it is, the easier it is, and the more "common" human will use it.

And not only that, you are scared already to add a new layer for compatibility and i see coming the "640k ought to be enough for anyone" symptom : just look at your new proposals : BADEPEND (proof using the short version won't be enough for a long time, and this will mean finally broke again to adapt to another version soon).

Drop short name standard, as it already showing its limits anyway (and will be unhandable (is that a word?) soon)

I'd like to change my vote, although there's no mechanism to do so. I mistakenly thought the poll was about variable naming conventions, a topic I find relatively uninteresting (didn't read mgorny's original post carefully enough); I voted essentially for the status quo.

Now that I've paid appropriate attention and caught up with my [gentoo-dev] mailing list reading, I see that the 3rd option is a new way to specify dependencies that seems pretty clean and well thought out. I think I'd vote for it. For reference, here's the original post on the mailing list archive. I think it's worth a read.

- John_________________I can confirm that I have received between 0 and 999 National Security Letters.

So there is no need for an additional parser: All code duplication can be eliminated by just storing it into separate shell variables (you can easily make the "foo?" check more complex by redefining c="foo?"). There is no need to implement an additional parser for reading metadata (which is horribly bad for third-party tools) if the result of this parsing can be produced just in a shell-appropriate manner and stored as-is in the metadata.

I see that the 3rd option is a new way to specify dependencies that seems pretty clean and well thought out.

DEPENDENCIES is just putting a completely unnecessary layer over bash variables: Namely, it allows you to define and use variables (for confusion called "label") within the DEPENDENCIES variable.
[…]
So there is no need for an additional parser: All code duplication can be eliminated by just storing it into separate shell variables (you can easily make the "foo?" check more complex by redefining c="foo?"). There is no need to implement an additional parser for reading metadata (which is horribly bad for third-party tools) if the result of this parsing can be produced just in a shell-appropriate manner and stored as-is in the metadata.

That observation is true indeed.

But isn't a parser needed to resolve ||(x y z) and foo?(x y) anyway? (Just saying, not that I would want that parser to be more complicated than needed)

@mv, regarding your comment about the unnecessary additional layer, I tend to view both the proposed DEPENDENCIES variable and the existing ?DEPEND variables at the same hierarchical level above the dependency resolver: not an additional layer but a different syntax for feeding the same input.

I also have to say that, looking at the example you copied out of the DEPENDENCIES proposal and your (quite correct) implementation of the same structure with traditional Bash and Portage mechanisms, in my opinion, the former is more readable and probably more easily maintainable over time than the latter.

I wasn't even considering the cost to 3rd party tools but, in fact, there's also a cost to internal infrastructure (e.g., eclasses). This is probably the biggest barrier to easy adoption. However, it's okay to consider (what some may believe to be) improvements, even if they come at a cost.

- John_________________I can confirm that I have received between 0 and 999 National Security Letters.

Objectively speaking, I like DEPENDENCIES as well. It looks clean, does not bring much additional complexity (As Dr.Willy pointed out, DEPENDS are not pure bash either), and it looks pretty simple to implement.

On the other hand, it's coming from Ciaran, and plenty of people don't like him (for good reasons)._________________“And even in authoritarian countries, information networks are helping people discover new facts and making governments more accountable.”– Hillary Clinton, Jan. 21, 2010

Sure, but this is the simplest parser which you can have for a text representation of this kind of data.
And to me, it is clear that the metadata cache should contain the simplest such representation which is available: Reasons are the mentioned third party tools, and also not spending unnecessary time for parsing when emerge is running. (The latter is probably not very crucial, but anyway it is an unnecessary overhead.)

Another difficulty is that - as simple as it might look if it is used in only one variable assignment - the thing becomes horrible if you use several variable assignments (which is necessary for eclasses and probably also happens for ebuilds which should serve simultaneously for several purposes, e.g. as live-ebuilds and non-live ebuilds):
The point is that with DEPENDENCIES (in contrast to *DEPEND) it depends on the order in which you add natural "blocks" of dependencies.

Do you immediately see that this code has probably at least 2 bugs?
In fact, the default label for DEPENDENCIES is "build+run". However, if the code added by python-49 does not end in "build+run:" and is added in front of your definition, then the dependency can end up in another (unexpected) label. Of course, this can be eliminated by requiring that if eclasses add something, they must always add "build+run:" at the end. But forcing redundant data into the metadata by policy because of a bad concept is not good, either.

Moreover, there is another problem: graulty/bazola is added either to build+run (if python-49 ends with build+run:) or to build (if $PV is 9999) which is probably not what was intended. Again, you can solve this possible overlook by using the policy to add the label explicitly for every atom which you add to DEPENDENCIES, but again this means a lot of unnecessary redundancy - which is actually what DEPENDENCIES claims to avoid in the first place.

So my opinion is that the DEPENDENCIES concept is not as well thought through as Ciaran claims it to be: It makes things more difficult for the package manager, for third party tools, for eclass authors, and even presents new traps for ebuild authors.

Edit: Another thing is the metadata itself: So far, this data was consistent over all EAPIs. Although there is no guarantee that this will remain so for all future EAPIs, it would be nice to keep this tradition as long as possible: Otherwise even less complex third-party tools would have to be EAPI aware.

(Sorry for the several edits - I inserted the last remark on the wrong place.)

Last edited by mv on Mon Sep 10, 2012 3:24 pm; edited 3 times in total

Perhaps a good compromise is combining both: To support only *DEPEND in the metadata, but to allow DEPENDENCIES in ebuilds (but not in eclasses), and portage immediately parses DEPENDENCIES to add them to the corresponding *DEPEND variables. This way, the parsing would have to be done only once, and all could be happy (except for third-party tools which are not able to read the metadata but must parse the ebuild - e.g. eix would have to be extended or ignore DEPENDENCIES in overlays which provide no metadata)

Do you immediately see that this code has probably at least 2 bugs?
In fact, the default label for DEPENDENCIES is "build+run". However, if the code added by python-49 does not end in "build+run:" and is added in front of your definition, then the dependency can end up in another (unexpected) label. Of course, this can be eliminated by requiring that if eclasses add something, they must always add "build+run:" at the end. But forcing redundant data into the metadata by policy because of a bad concept is not good, either.

Moreover, there is another problem: graulty/bazola is added either to build+run (if python-49 ends with build+run:) or to build (if $PV is 9999) which is probably not what was intended. Again, you can solve this possible overlook by using the policy to add the label explicitly for every atom which you add to DEPENDENCIES, but again this means a lot of unnecessary redundancy - which is actually what DEPENDENCIES claims to avoid in the first place.

I don't think this is right.

First off, you can write it like this, and it will be cleaner:

Code:

inherit python-49

[[ $PV == 9999 ]] && BUILD_DEP="my/vcs"

DEPENDENCIES="
foo/bar
build:
${BUILD_DEP}
graulty/bazola"

in which case it's much clearer what goes where.

As for eclasses, you're missing that the package manager merges the dependencies - they're not simply concatenated. That is, you don't write DEPENDS+=" my/dep"._________________“And even in authoritarian countries, information networks are helping people discover new facts and making governments more accountable.”– Hillary Clinton, Jan. 21, 2010

For naming, I would prefer LONG_THINGS, for one it's easier for newbs to read, because it can give away the purpose easily and secondly, A-Z is a 26 character limit, don't know what the future holds, but doubling a character may occur and naming that with another starting character would be confusing.

As for dep definition and parsing, go to hell with XML, way to much noise and needs either a dep on itself or a lot of bash magic. On the other hand, JSON is pretty readable and powerful, but can easily be handled in bash or awk._________________++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.