edited by oehme

I need to ensure that the runtime dependencies I ship in my distribution come from a specific repository that passes some licence check. Plugins and compile only dependencies don't have this restriction

A repository was added to provide only some specific dependency, but I don't want Gradle to ask it for other dependencies. Maybe it is slower or less trustworthy than others.

This comment has been minimized.

+1
I figured that a way to make the situation a bit better is to think carefully about the order in which you declare your repositories. Where I work we have a nexus private repo that sometime is very slow to resolve dependencies. Moving it to the bottom made our life much easier.

oehme
changed the title
Provide support for declaring repositories on a per-artifact basisReduce overhead of having many repositoriesSep 19, 2017

This comment has been minimized.

edited

I've adjusted the title of this to refer to the actual problem. Assigning a repository to a dependency might not be the best solution. There are alternatives like remembering that a certain module was absent from a repository and not checking it again when we want to see if there is a new version.

Another alternative would be to do matching of repositories based on certain attributes.

This comment has been minimized.

edited

For my usecase, I care less about the performance problem, and more about being able to assign a specific dependency to a specific repository for correctness reasons.

Example story:

a jar from mavencentral has a bug / needs a feature

I modify the jar to suit my needs

I publish it to our local 3rd party repo under a new version

at some point in the future, a new jar with the same version is published to mavencentral

I might be in the minority here, but I imagine it's a pretty common situation for a project to be "we get all our deps from mavenCentral, except this one vendor library which comes from the vendor's maven repo, and this one hacked-up library we check-in to our 'libs' directory". I guess it's a sticky issue figuring out how to resolve the transitives...

This comment has been minimized.

@netwigg The way I've handled your use case in the past is to deploy your "fixed/hacked" artifact under an expressly different GAV coordinate. It may just be that the version is overtly different (i.e. "1.0.1" --> "1.0.1.MYCOMPANY_PATCH").

Another way is to prepend the groupId or artifactId with yours. Back in the day, SpringSource did this with their OSGi Enterprise Bundle Repository. They took a ton of OSS libraries that were not OSGi'd, and modified their MANIFEST.MF and repackaged them and deployed them with the same exact groupId and version, but the artifiactId was prepended with "com.springsource". For example, "org.apache.commons:commons-io" became "org.apache.commons:com.springsource.commons-io".

I like prepending to the group or artifact ID over changing the version, just to make it clear it is in fact a "different" artifact. It also better avoids collisions.

This comment has been minimized.

Publishing it under a different group is a very good approach. You can use a substitution rule to tell Gradle to replace any appearance of the original with your modified one. Then you don't rely on repository order for correctness.

This comment has been minimized.

The problem with the GAV changes is that they propagate to projects which consume your project as 4th party dependencies but those consumer projects might not have access to the same repositories you have. You care where the libs in your distribution build come from but you don't care / control where consumer projects get them from.

This comment has been minimized.

Another use case for this which I think is fairly common at companies and which shows that this is not only about performance: as per company policy 3rd parties have to pulled from a certain "approved" repository which is basically the same as Maven central but has some additional stuff like a virus scanner or something.

This comment has been minimized.

I don't understand the two points above. Can you please elaborate how the original proposal of "repositories on dependencies" would have solved them?

If you modify a lib, but don't give consumers access to the repo containing that modified version, these consumers can't work. They can't just use the unmodified version, since that's not going to work correctly with your project. But if they really want to try anyway, they can use a dependency resolution rule to adjust that dependency back to the unmodified version.

If you require dependencies to be pulled from a specific repo for policy reasons, it's much safer to validate that no other repo is used than to rely on repository ordering or trusting users to specify the repository on each dependency/configuration.

This comment has been minimized.

If you require dependencies to be pulled from a specific repo for policy reasons, it's much safer to validate that no other repo is used than to rely on repository ordering or trusting users to specify the repository on each dependency/configuration.

How exactly can you validate that no other repo is used today? Usually, the repo which has to be used for policy reasons does not contain all dependencies of your project. For example, due to otherwise too much maintenance effort, the repo which has to be used for policy reasons only contains the dependencies you redistribute (runtime dependencies). However, your project might also have test dependencies, integration test dependencies, compile only dependencies, plugins, etc.. which are not available on that repository. Therefore the only solution I see today is to rely on repository ordering which we both agree on is not a safe / reliable solution. Trusting developers to specify the repository on each dependency/configuration would be a reliable solution because with that your build fails if a certain configuration isn't on the repo you specified it should be.

If you modify a lib, but don't give consumers access to the repo containing that modified version, these consumers can't work.

What if the lib isn't modified, just has to be pulled from a different (private) repo for policy reasons? However, I admit that this use case is a bit far fetched, might not be as common as the "need to use repo X for policy reasons" use case. Dependency substitution on the consumer side would be a fix, but that complicates consumer projects quite a bit. With this ticket, this becomes easier since you don't need to change the GAV.

This comment has been minimized.

You can add a task to the build that goes through the repositories and fails if any non-trusted one is used. This could be part of a plugin that every project is required to use.

Usually, the repo which has to be used for policy reasons does not contain all dependencies of your project.

Plugins, compile only dependencies etc. can all compromise the code that you deliver. Compromised testing libraries could help an attacker hide such issues. Checking only the jars you redistribute doesn't mean the distribution is safe to use.

Trusting developers to specify the repository on each dependency/configuration would be a reliable solution

Isn't all policy about not trusting a single party, but having an independent party cross-check? It's much easier to check that each project is using a policy-checking plugin than checking whether each project implements each policy on a line-by-line level in their build scripts.

All that being said, I can imagine having some attribute-to-repository matching that allows you to restrict Gradle's search for dependencies. I would suggest opening a separate issue for this though, because the original intent of this one is about performance, not policy.

This comment has been minimized.

edited

You can add a task to the build that goes through the repositories and fails if any non-trusted one is used. This could be part of a plugin that every project is required to use.

Doesn't solve anything for the case I described since what is considered "trusted" depends on the artifact/configuration.

Checking only the jars you redistribute doesn't mean the distribution is safe to use.

Let's not discuss whether such policies make sense or not. Fact is such policies exist and developers have to deal with them.

Isn't all policy about not trusting a single party, but having an independent party cross-check? It's much easier to check that each project is using a policy-checking plugin than checking whether each project implements each policy on a line-by-line level in their build scripts.

I don't really get this point. Goal is to fail the build if certain dependencies aren't pulled from a trusted repository. The fact that the build fails serve as proof that the build implements the policy. Whether the build fails because a plugin isn't present or because something isn't configured correctly in the build script itself doesn't really matter. Issue is that with Gradle today you can only fail the build if any dependency cannot be found on any specified repository. But what is needed is to fail the build if certain dependencies cannot be found on certain specified repositories.

All that being said, I can imagine having some attribute-to-repository matching that allows you to restrict Gradle's search for dependencies. I would suggest opening a separate issue for this though, because the original intent of this one is about performance, not policy.

That would be great. I'm happy to file a separate issue for this, but reading the description and comments of this ticket, why do you think this is only about performance? The "Expected Behavior" of this ticket sounds like what I'm trying to describe here and doesn't mention performance as an issue.

This comment has been minimized.

Goal is to fail the build if certain dependencies aren't pulled from a trusted repository. The fact that the build fails serve as proof that the build implements the policy.

So what does a green build tell you in this case? It could be implementing the policy (and no bad dependencies were used) or it might not be implementing the policy (and it might use bad dependencies). It's green either way.

I'm happy to file a separate issue for this, but reading the description and comments of this ticket, why do you think this is only about performance? The "Expected Behavior" of this ticket sounds like what I'm trying to describe here and doesn't mention performance as an issue.

That's because it went into a technical proposal too early. The context section mentions the actual problem - I add a repo just for a single dependency and everything else gets slower. By now it's gotten hard to tell who added a +1 for "make it faster" and who added a +1 for "make policy easier".

This comment has been minimized.

edited

I think it's all about what the original Expected Behavior stated:

Individual dependencies can declare the repositorie(s). Any runtime the dependencies are only resolved from the declared dependencies. The build fails if the dependency cannot be found in the declared repositories.

Edit: this is useless debate, and really unproductive. But both ideas seem necessary? Split the ticket if you need to just stop debating.

oehme
changed the title
Reduce overhead of having many repositoriesAllow matching repositories to dependenciesSep 28, 2017

This comment has been minimized.

edited

Another thing to take into consideration here is if Gradle eventually provides plugins with the ability to provide custom dependency types along with custom dependency resolution/repositories (somewhat mentioned in #1400).

This comment has been minimized.

I believe this issue is quite sensitive. We run our own repository for nightly snapshots and I'm amazed to see our access logs from very large Fortune companies aggressively trying to access hundreds of dependencies. We get to know who uses what and which versions (and so does MavenCentral and other repos), which is quite a security leak, plus it scares me that we could even serve some of those (if we were bad guys).

This comment has been minimized.

this issue would solve problems like: 'javax.mail:mail:1.3.1'. in central (and a few other repos) there is only pom but no jar. jar is in other repo. so make gradle survive jar missing in central i have to put that pecific repo as a first one.

so the moment i hit the same problem with different repo means i won't be able to build my project with gradle?

This comment has been minimized.

What is currently stopping work being carried out on this? We also has this problem with public repositories even being spammed with requests for non-existent artifacts. Would it not suffice to add a whitelist/blacklist (includes/excludes) attribute to a repository block to either:

include resolution of all matching dependencies

exclude resolution of all matching dependencies

I would be willing to help out if someone would like to point me in the right direction.

This comment has been minimized.

At this point we're collecting use cases for this, to make sure filtering, or matching, is the right solution for each one. We seeing advantages on implementing this, but we'd like to make sure the use cases for this are real, so starting with a list of use cases and possible solutions to them is a good start. Then we can decide if how and when we implement.

be able to tell gradle: please check nexus.mycomp.com/my-releases for any artifacts having the group mycomp

I also second @ar's comment. The information found in the incorrect requests being sent out can be very revealing, not to mention dangerous.

We are currently solving some of the problems with rewrite rules on the webserver serving our nexus, catching and sending back 404's for any requests known to be incorrect - but this is a really ugly solution to maintain.

Do you know when you'll be finished collecting information for use cases? Just asking since the original ticket was opened in 2010 😃

This comment has been minimized.

We've been working on dependency management features intensively for the past months, and this issue has been mentioned several times. We were close to implementing but always found better ways to solve the use cases we originally thought would need this. This doesn't mean they aren't. Performance improvement is another one. It is unlikely we will do anything on this before october as we have many more features to finish before.

Usually our customers having such issues workaround by having an internal repository which also does proxying, but we reckon it's not always that simple.

This comment has been minimized.

edited

We use an internal nexus with cached/proxied remote repositories for jcenter, maven-central and Co - the default negative cache in Nexus is 1440 minutes, meaning the remote repository will only be contacted once per day for something it doesn't have.

Once each repository has been inspected for the module, Gradle will choose the 'best' one to use.

correctly (but maybe I've got this part wrong), every configured repository will be contacted on every build with an empty gradle dependency cache (~/.gradle/caches/modules-2) (regardless of the ordering of the repositories). So, even if we declare the internal repository first and the dependency is found, the other irrelevant repositories are checked regardless. Is this correct?

This comment has been minimized.

edited

This comment has been minimized.

My use case is some issues I've run into on multiple occasions where jcenter hosts old or unofficial artifacts that I need to get from another repo (e.g. Firebase artifacts that should come from google but are instead resolved to jcenter causing versioning issues; declaring google before jcenter fixes that issue but causes others).

Another one (that prompted me to search for this) is companies host their own artifacts, but adding their repository slows down the build, or worse, is somehow used to download all artifacts (even though it is specified last in repositories). My current issue is with Cloudflare's mobile SDK which is: