solr-commons-csv must not use the org.apache.commons.csv package

Details

Description

The solr-commons-csv artifact reused the code from the Apache Commons CSV project but the package wasn't changed to something else than org.apache.commons.csv in the process. This creates a compatibility issue as the Apache Commons team works toward an official release of Commons CSV. It prevents Commons CSV from using its own org.apache.commons.csv package, or forces the renaming of all the classes to avoid a classpath conflict.

Ryan McKinley
added a comment - 06/Mar/12 02:05 Solr is using a snapshot commons CSV build (commons-csv-1.0-SNAPSHOT-r966014) – solr does not change anything and would prefer to use a release build.
Has there been class refactorings that make replacing the .jar with the dev build impossible? If so, perhaps we should just update the .jar file we are including.

As far as I can determine, this is just a maven meta-data issue?
i.e. it doesn't seem like we should change java package names, but just make sure that the maven meta-data is completely distinct from any official commons-csv releases? (disclaimer: I still really know nothing of maven, other than it's becoming more a pain all the time

Yonik Seeley
added a comment - 06/Mar/12 02:09 Background here: http://markmail.org/message/lsxtxoetpw7a47uf
As far as I can determine, this is just a maven meta-data issue?
i.e. it doesn't seem like we should change java package names, but just make sure that the maven meta-data is completely distinct from any official commons-csv releases? (disclaimer: I still really know nothing of maven, other than it's becoming more a pain all the time

Ryan McKinley
added a comment - 06/Mar/12 02:24 Emmanuel, can you confirm this is just a maven issue?
Can it be solved with adding exclusions?
<dependency>
<groupId> org.apache.solr </groupId>
<artifactId> solr-core </artifactId>
<version> ${solr.version} </version>
<exclusions>
<exclusion>
<groupId> org.apache.solr </groupId>
<artifactId> solr-commons-csv </artifactId>
</exclusion>
...
Obviously the best approach is to work with an official release... but until that exists, what do you suggest? (Actually forking the project and changing the package names seems pretty bad)

Hi all and thank you for your quick reply. This is not a "simple" Maven metadata issue. Even if you consider solr-commons-csv to be an internal part of Solr it's now a public Maven artifact that people are starting to use in their projects (The Apache Commons teams even starts receiving bug reports for solr-commons-csv). When Commons CSV is officially released there will be a conflict if solr-commons-csv and commons-csv are both brought to the classpath through the transitive dependencies of a project, because they have the same classes in the same package.

Imagine a project using Solr and doing CSV stuff with Commons CSV, when the CSVParser classes is instantiated it can be either the one from solr-commons-csv or the one from Commons CSV, this is guaranteed to break badly.

The right approach is to change the package of solr-commons-csv, for example to org.apache.solr.commons.csv, so both solr-commons-csv and Commons CSV can coexist on the classpath without conflict. This can be automated with the Maven Shade plugin.

Emmanuel Bourg
added a comment - 06/Mar/12 08:09 Hi all and thank you for your quick reply. This is not a "simple" Maven metadata issue. Even if you consider solr-commons-csv to be an internal part of Solr it's now a public Maven artifact that people are starting to use in their projects (The Apache Commons teams even starts receiving bug reports for solr-commons-csv). When Commons CSV is officially released there will be a conflict if solr-commons-csv and commons-csv are both brought to the classpath through the transitive dependencies of a project, because they have the same classes in the same package.
Imagine a project using Solr and doing CSV stuff with Commons CSV, when the CSVParser classes is instantiated it can be either the one from solr-commons-csv or the one from Commons CSV, this is guaranteed to break badly.
The right approach is to change the package of solr-commons-csv, for example to org.apache.solr.commons.csv, so both solr-commons-csv and Commons CSV can coexist on the classpath without conflict. This can be automated with the Maven Shade plugin.

Robert Muir
added a comment - 06/Mar/12 12:35 This is a maven issue. solr-commons-csv only exists to maven.
Otherwise these are just classes inside the war file (essentially an implementation detail of solr).
It seems to me, no offense, that the solution is to make an official release of commons-csv!
Then nothing needs to coexist, as we would rather use a release instead anywaay, and all
grandfather poms,wagons, nexuses,surefires,whatever in maven will be adjusted and everyone is happy.

I agree a release of Commons CSV is desirable, but that's not a reason for using the namespace of other projects. The artifact is free to use by anyone from the Maven repository, it's no longer an implementation detail of Solr, people are starting to use it in the wild, and this will lead to classpath conflicts.

Repackaging a Commons component and renaming its package isn't uncommon, Tomcat does this for DBCP:

Emmanuel Bourg
added a comment - 06/Mar/12 12:52 I agree a release of Commons CSV is desirable, but that's not a reason for using the namespace of other projects. The artifact is free to use by anyone from the Maven repository, it's no longer an implementation detail of Solr, people are starting to use it in the wild, and this will lead to classpath conflicts.
Repackaging a Commons component and renaming its package isn't uncommon, Tomcat does this for DBCP:
http://www.jarvana.com/jarvana/inspect/org/apache/tomcat/tomcat-dbcp/7.0.23/tomcat-dbcp-7.0.23.jar
Tomcat is released with a tomcat-dbcp artifact containing the DBCP classes but under the org.apache.tomcat.dbcp package. Solr should adopt the same approach until Commons CSV is officially released.

Robert, should I appreciate that you are negating the issue? This is considered a blocker because Solr must not make new releases until this is resolved. This is a serious issue, don't take it lightly please.

Emmanuel Bourg
added a comment - 06/Mar/12 13:25 Robert, should I appreciate that you are negating the issue? This is considered a blocker because Solr must not make new releases until this is resolved. This is a serious issue, don't take it lightly please.

It is not an issue, as Solr is built with Apache Ant, Apache Maven is not fully supported and only "for convenience". If it works, it works, if not its of minor priority. It definitely does not block a release.

Uwe Schindler
added a comment - 06/Mar/12 13:32 It is not an issue, as Solr is built with Apache Ant, Apache Maven is not fully supported and only "for convenience". If it works, it works, if not its of minor priority. It definitely does not block a release.

Robert Muir
added a comment - 06/Mar/12 14:10
This is considered a blocker because Solr must not make new releases until this is resolved. This is a serious issue, don't take it lightly please.
Really? who will stop us?
Seriously, all solr does is put commons-csv-1.0-SNAPSHOT-r966014.jar in solr/lib (along with LICENSE/NOTICE information).
We don't use any commons-csv-1.0-RELEASE.jar because one doesnt exist.

This is not a maven problem. You are releasing code in the Commons namespace that eventually will clash. I am sure everyone is on board to get this resolved by releasing commons-csv ...but if you do a release beforehand - please be a good ASF citizen and use some kind of shading technique (e.g. using jarjar).

Torsten Curdt
added a comment - 06/Mar/12 14:29 Folks, how can you not see that this is an issue?
This is not a maven problem. You are releasing code in the Commons namespace that eventually will clash. I am sure everyone is on board to get this resolved by releasing commons-csv ...but if you do a release beforehand - please be a good ASF citizen and use some kind of shading technique (e.g. using jarjar).
Just closing this issue as "Won't Fix" is not helping.

Emmanuel Bourg
added a comment - 06/Mar/12 14:33 Really? who will stop us?
Your morality, kindness and innate sense of teamwork among Apache developers ?
You may have not noticed that solr-commons-csv is in the central Maven repository, it's not just secretly included in the solr/lib directory:
http://search.maven.org/#artifactdetails|org.apache.solr|solr-commons-csv|3.5.0|jar
That means two things:
People are using this artifact in their Maven projects, independently of Solr
Anyone importing the solr-core artifact in its project will be unable to use Commons CSV when it's released

Robert Muir
added a comment - 06/Mar/12 14:36
You are releasing code in the Commons namespace that eventually will clash
We aren't releasing any code.
We have a library in solr/lib. we are depending upon this library. we don't release it.
This issue is only about maven, which is just supplied as convenience, as Uwe mentions.

Robert Muir
added a comment - 06/Mar/12 14:44 -1
I don't think we should code-dup entire packages just because of the viral nature of maven
(to make a maven release of X, all of its dependencies must also be in maven, too).
That seems like the root problem here. Perhaps a blocker issue should be opened in maven's JIRA instead.

Uwe Schindler
added a comment - 06/Mar/12 14:54 Just to add: Should we also clone/fork the whole Mortbay Jetty repository or Apache Xerces into Solr's tree? For both libraries we also have bugfixes but upstream did not release an update until now?
What will user X do, if the probject he is using depends on an older CSV version but Solr needs a new one, although in Maven Central? Yes, he has a conflict. The situation is not different here.

Uwe Schindler
added a comment - 06/Mar/12 15:04 One idea how to solve the issue: Maybe commons-csv should publish their nightly builds through jenkins to maven? Then Solr could add the snapshot build as a dependency and we are done?
Uwe

Robert, if you are not releasing code then how did the artifact find its way into the Maven Central repository? The maintainers of the repository will ONLY accept artifacts when they come from the appropriate place - in this case that would need to be Apache's repository. I can't imagine how anyone other than Solr's team could have caused that to happen.

As for snapshots, that doesn't really solve the problem as you can't release with a dependency on a snapshot.

If the intent was just to use the jar internally in a war file that your customers don't change then somewhere in your build you have a mistake as the jar is apparently being made available to the public. If, however, the dependency is required for users or Solr to do builds then you need to change the package names so that they don't conflict with those in Commons.

Ralph Goers
added a comment - 06/Mar/12 15:17 Robert, if you are not releasing code then how did the artifact find its way into the Maven Central repository? The maintainers of the repository will ONLY accept artifacts when they come from the appropriate place - in this case that would need to be Apache's repository. I can't imagine how anyone other than Solr's team could have caused that to happen.
As for snapshots, that doesn't really solve the problem as you can't release with a dependency on a snapshot.
If the intent was just to use the jar internally in a war file that your customers don't change then somewhere in your build you have a mistake as the jar is apparently being made available to the public. If, however, the dependency is required for users or Solr to do builds then you need to change the package names so that they don't conflict with those in Commons.

Its frequently the case that lucene/solr depend on things that are not in maven. When this is the case, the guys who work on maven here
do whatever is necessary to make maven work (I really don't know the details, most of us just use ant).

This is purely a maven issue, and as mentioned earlier, maven is provided as a convenience for build configuration, no different from
convenience build configurations provided for Eclipse or Intellij IDEA.

Thats why I said: we aren't releasing anything. there is just commons-csv-1.0-SNAPSHOT-r966014.jar in our solr/lib. Our ant build system
packages this up into solr.war

As for maven, eclipse, intellij, netbeans, whatever, it doesnt really matter to me how they work, but supporting those alternative
build configurations != releasing code.

Robert Muir
added a comment - 06/Mar/12 15:28 Ralph: see Uwe's comment.
Its frequently the case that lucene/solr depend on things that are not in maven. When this is the case, the guys who work on maven here
do whatever is necessary to make maven work (I really don't know the details, most of us just use ant).
This is purely a maven issue, and as mentioned earlier, maven is provided as a convenience for build configuration, no different from
convenience build configurations provided for Eclipse or Intellij IDEA.
Thats why I said: we aren't releasing anything. there is just commons-csv-1.0-SNAPSHOT-r966014.jar in our solr/lib. Our ant build system
packages this up into solr.war
As for maven, eclipse, intellij, netbeans, whatever, it doesnt really matter to me how they work, but supporting those alternative
build configurations != releasing code.

One idea how to solve the issue: Maybe commons-csv should publish their nightly builds through jenkins to maven? Then Solr could add the snapshot build as a dependency and we are done?

Commons CSV snapshots are available in the Apache snapshot repository, but when you release Solr in the central repository all its dependencies have to be there as well, so that won't work.

What will user X do, if the project he is using depends on an older CSV version but Solr needs a new one, although in Maven Central? Yes, he has a conflict. The situation is not different here.

Once Commons CSV is released we guarantee the binary compatibility of the next releases. The Apache Commons team takes compatibility issues very seriously because our components are reused in many projects. When we have to break the compatibility the code is moved to a new package to avoid conflicts. For example Commons Lang 3 broke the binary compatibility with the previous releases and used a new namespace org.apache.commons.lang3 to avoid issues.

If Solr wants to shield its user from incompatibilities, even after Commons CSV is officially released, it's possible to automate the process of renaming the package of the dependencies. That's what Tomcat does with Commons DBCP. Tools like the Maven Shade plugin or the JarJar Ant tasks are designed for this purpose.

Emmanuel Bourg
added a comment - 06/Mar/12 15:31
One idea how to solve the issue: Maybe commons-csv should publish their nightly builds through jenkins to maven? Then Solr could add the snapshot build as a dependency and we are done?
Commons CSV snapshots are available in the Apache snapshot repository, but when you release Solr in the central repository all its dependencies have to be there as well, so that won't work.
What will user X do, if the project he is using depends on an older CSV version but Solr needs a new one, although in Maven Central? Yes, he has a conflict. The situation is not different here.
Once Commons CSV is released we guarantee the binary compatibility of the next releases. The Apache Commons team takes compatibility issues very seriously because our components are reused in many projects. When we have to break the compatibility the code is moved to a new package to avoid conflicts. For example Commons Lang 3 broke the binary compatibility with the previous releases and used a new namespace org.apache.commons.lang3 to avoid issues.
If Solr wants to shield its user from incompatibilities, even after Commons CSV is officially released, it's possible to automate the process of renaming the package of the dependencies. That's what Tomcat does with Commons DBCP. Tools like the Maven Shade plugin or the JarJar Ant tasks are designed for this purpose.

Robert, if you are not releasing code then how did the artifact find its way into the Maven Central repository? The maintainers of the repository will ONLY accept artifacts when they come from the appropriate place - in this case that would need to be Apache's repository. I can't imagine how anyone other than Solr's team could have caused that to happen.

We are releasing Maven Artifacts of Solr just for convenience using a special ANT script (which is unsupported by our committers). The community wants it, so we do it. Maybe that script also published one of the snapshot artifacts (is there also a Jetty prerelease artifact?); but that was not our intention. We should simply remove it.

Uwe Schindler
added a comment - 06/Mar/12 15:31 Robert, if you are not releasing code then how did the artifact find its way into the Maven Central repository? The maintainers of the repository will ONLY accept artifacts when they come from the appropriate place - in this case that would need to be Apache's repository. I can't imagine how anyone other than Solr's team could have caused that to happen.
We are releasing Maven Artifacts of Solr just for convenience using a special ANT script (which is unsupported by our committers). The community wants it, so we do it. Maybe that script also published one of the snapshot artifacts (is there also a Jetty prerelease artifact?); but that was not our intention. We should simply remove it.

You guys don't like maven - we get that (who really does?) ...but that's not the point. For now let's just assume your bugfix jars did not get released to maven central. (That's another issue to deal with)

What about Solr users that now have some unreleases Xerces, Jetty und Commons classes in their classpath? Maybe even without knowing? If you don't hate your users - don't do it.

Maintaining upstreaming patches is painful - but that's how it is. Push for a release, maintain a patch or (only worst case) fork.

But there is an option somewhere in the middle is using using tools like minijar/the shade plugin/jarjar. They let you do the rewriting of the package transparently on the byte code level. That makes getting back to the releases version easier. Or you do it like Tomcat - they so some search and replace in their build.

I your specific case I would keep the bugfix jars in your lib dir (just like you have now) but modify your build to include a relocated version of your bugfix jars into the Solr jar (by using jarjar). Problem solved!

No one needs those bugfix version in a repo and it's super easy to fall back to the official release once out.

There are ways around big code duplications and still avoid the namespace problem.
Come on guys - be good to the world

Torsten Curdt
added a comment - 06/Mar/12 15:37 You guys don't like maven - we get that (who really does?) ...but that's not the point. For now let's just assume your bugfix jars did not get released to maven central. (That's another issue to deal with)
What about Solr users that now have some unreleases Xerces, Jetty und Commons classes in their classpath? Maybe even without knowing? If you don't hate your users - don't do it.
Maintaining upstreaming patches is painful - but that's how it is. Push for a release, maintain a patch or (only worst case) fork.
But there is an option somewhere in the middle is using using tools like minijar/the shade plugin/jarjar. They let you do the rewriting of the package transparently on the byte code level. That makes getting back to the releases version easier. Or you do it like Tomcat - they so some search and replace in their build.
I your specific case I would keep the bugfix jars in your lib dir (just like you have now) but modify your build to include a relocated version of your bugfix jars into the Solr jar (by using jarjar). Problem solved!
No one needs those bugfix version in a repo and it's super easy to fall back to the official release once out.
There are ways around big code duplications and still avoid the namespace problem.
Come on guys - be good to the world

This is a classpath issue - a class can be loaded only once in a given classpath.
It's vitally important to ensure that each class has only a single definition in a classpath, otherwise all sorts of failures can occur. These failures can be extremely difficult to debug.

Maven uses the groupId/artifactId to resolve multiple references to the same jar.
It picks one copy of the most recent version.

This only works properly if the Maven ids are unique for each package.
It does not read the package names within the jar; it assumes that the POM is correct for the associated jar.

In this case, the Solr pom uses its own unique groupId/artifactId, but the package names used within the jar (and indeed the pom inside the jar) are for an entirely different package. This is totally contrary to the way Maven is supposed to be used.

The consequence is that Maven does not know that the renamed "solr" commons-csv jar uses the same package as the "real" commons-csv, so it cannot prevent duplicate classes on the classpath.

Sebb
added a comment - 06/Mar/12 15:38 This is a classpath issue - a class can be loaded only once in a given classpath.
It's vitally important to ensure that each class has only a single definition in a classpath, otherwise all sorts of failures can occur. These failures can be extremely difficult to debug.
Maven uses the groupId/artifactId to resolve multiple references to the same jar.
It picks one copy of the most recent version.
This only works properly if the Maven ids are unique for each package.
It does not read the package names within the jar; it assumes that the POM is correct for the associated jar.
In this case, the Solr pom uses its own unique groupId/artifactId, but the package names used within the jar (and indeed the pom inside the jar) are for an entirely different package. This is totally contrary to the way Maven is supposed to be used.
The consequence is that Maven does not know that the renamed "solr" commons-csv jar uses the same package as the "real" commons-csv, so it cannot prevent duplicate classes on the classpath.

We are releasing Maven Artifacts of Solr just for convenience using a special ANT script (which is unsupported by our committers). The community wants it, so we do it. Maybe that script also published one of the snapshot artifacts (is there also a Jetty prerelease artifact?); but that was not our intention. We should simply remove it.

Because of Solr's dependency on a snapshot of commons-csv, and because Maven releases are not allowed to depend on snapshots, we have to publish the snapshot, renamed not to appear to be a snapshot. Simply removing it is not an option, assuming we will continue to provide Maven artifacts as part of the Lucene/Solr release process.

Steve Rowe
added a comment - 06/Mar/12 15:44 I think we should fix this.
We are releasing Maven Artifacts of Solr just for convenience using a special ANT script (which is unsupported by our committers). The community wants it, so we do it. Maybe that script also published one of the snapshot artifacts (is there also a Jetty prerelease artifact?); but that was not our intention. We should simply remove it.
Because of Solr's dependency on a snapshot of commons-csv, and because Maven releases are not allowed to depend on snapshots, we have to publish the snapshot, renamed not to appear to be a snapshot. Simply removing it is not an option, assuming we will continue to provide Maven artifacts as part of the Lucene/Solr release process.

Robert Muir
added a comment - 06/Mar/12 15:44
Maintaining upstreaming patches is painful - but that's how it is.
If there is a bug in some 3rd party component: i care about fixing the bug.
I don't care about maven, I'm not going to do the extra work to deal with these viral issues and its not a blocker for releasing 3.6

Emmanuel Bourg
added a comment - 06/Mar/12 15:45 There is a similar issue with the solr-carrot2-core, solr-noggit, solr-uima and solr-jsonic artifacts, and probably more.
You were right to -1 my patch Robert, the issue is much deeper.

I think the jarjar idea was raised quite often, this seems to be a good solution, so we republish all "changed" dependencies using a different class/pkg name. We should create jarjared versions of all those changed snapshot dpenendecies, so the use o.a.solr.internal./o.a.lucene.internal. as package name (like Java does for Xerces/Xalan/Rhino).

Those jars would be published by maven as it is now, but would never conflict.

Uwe Schindler
added a comment - 06/Mar/12 15:51 I think the jarjar idea was raised quite often, this seems to be a good solution, so we republish all "changed" dependencies using a different class/pkg name. We should create jarjared versions of all those changed snapshot dpenendecies, so the use o.a.solr.internal. /o.a.lucene.internal. as package name (like Java does for Xerces/Xalan/Rhino).
Those jars would be published by maven as it is now, but would never conflict.

Torsten Curdt
added a comment - 06/Mar/12 15:59 Steve, no you don't have to publish those snapshots. Please read my previous comment about inlining the bugfix jars.
Please folks, look at http://code.google.com/p/jarjar/
It's an easy fix and you would make many people happy.

I think the jarjar idea was raised quite often, this seems to be a good solution, so we republish all "changed" dependencies using a different class/pkg name.

This sets an unreasonable burden to fix a simple bug in a third-party component (think, the jetty patch).

Its already hard enough to dive in as "stranger in a strange land" to some other codebase to fix a bug that affects our stuff,
and then to rename anything that uses it too just because of maven?

Lets take the jetty case... should we seriously rename all the jetty stuff because it has a unicode bug? what about configuration
and reflection and stuff for this servlet container? E.g. users will no longer be able to use any normal jetty configuration or
jetty documentation because we re-jarjared all the package names?

Robert Muir
added a comment - 06/Mar/12 16:00
I think the jarjar idea was raised quite often, this seems to be a good solution, so we republish all "changed" dependencies using a different class/pkg name.
This sets an unreasonable burden to fix a simple bug in a third-party component (think, the jetty patch).
Its already hard enough to dive in as "stranger in a strange land" to some other codebase to fix a bug that affects our stuff,
and then to rename anything that uses it too just because of maven?
Lets take the jetty case... should we seriously rename all the jetty stuff because it has a unicode bug? what about configuration
and reflection and stuff for this servlet container? E.g. users will no longer be able to use any normal jetty configuration or
jetty documentation because we re-jarjared all the package names?
<Configure id="Server" class="org.eclipse.jetty.server.Server">
<!-- basic configuration here -->
</Configure>
etc
this is insane.

Uwe, no need to even publish those bugfix jars. I would just inline them into the Solr artifact. (Think ueberjar) Then you don't even have to change a line of java source code nor the bugfix jars. Just the build.

Torsten Curdt
added a comment - 06/Mar/12 16:04 Uwe, no need to even publish those bugfix jars. I would just inline them into the Solr artifact. (Think ueberjar) Then you don't even have to change a line of java source code nor the bugfix jars. Just the build.

Well, I do see the problem with jetty but doubt the other dependencies are that problematic.
{quote

Jarjaring is dangerous also because of reflection in from other libraries.

For example, if we find a bug in ICU, submit it, and they commit it, we should be able to
depend upon a snapshot unreleased build of ICU, yet, at the same-time, arabic PDF extraction
should still work (apache PDFBox reflects the icu package names to determine if this works).

So renaming packages with jarjar is not safe for a number of reasons, I really don't think
we should do this just to satisfy maven.

Robert Muir
added a comment - 06/Mar/12 16:16
Well, I do see the problem with jetty but doubt the other dependencies are that problematic.
{quote
Jarjaring is dangerous also because of reflection in from other libraries.
For example, if we find a bug in ICU, submit it, and they commit it, we should be able to
depend upon a snapshot unreleased build of ICU, yet, at the same-time, arabic PDF extraction
should still work (apache PDFBox reflects the icu package names to determine if this works).
So renaming packages with jarjar is not safe for a number of reasons, I really don't think
we should do this just to satisfy maven.

Robert: Its not only a maven problem, its the DLL ähm JAR-hell So I agree with all others that having patched JARs is a no-go for production, because nobody is able to then satisfy all dependencies.

On the other hand, Solr is a end-user product, so I have no idea why it is in maven at all. You download the distribution and install it. If you need to compile plugins against it, setup your eclipse classpath. Lucene is different, but it has no patched JARs (it does not depend on any external JAR at all). The Xerces JAR in benchmark is also only a "utility" package, that should not be in maven.

To come back to the example:

Jetty should not be a problem, because Solr will upgrade to Jetty 8 (breaking lots of backwards). For the Jetty 6 series there is no support at all, so there will not be a new release. Our fixed Jetty 6 is just a plain jetty with one broken unicode method patched. If somebody accidently have it in his classpath, he will be happy that Chinese, Japanese, and Mormon UTF8 works correct...

For the other JARs that have no other relations except to Solr, JARJARing them is no issue at all. It would be what I would recommend. This affects commons JARs like commons-csv, also noggit,

The ICU example is just a minor "example", we will not need to patch ICU, because they are releasing quite often.

If you have reflection dependecies or dependencies from other JARs in you classpath, you can still install the original unpatched version and you are happy. They will not conflict.

Uwe Schindler
added a comment - 06/Mar/12 16:37 Robert: Its not only a maven problem, its the DLL ähm JAR-hell So I agree with all others that having patched JARs is a no-go for production, because nobody is able to then satisfy all dependencies.
On the other hand, Solr is a end-user product, so I have no idea why it is in maven at all. You download the distribution and install it. If you need to compile plugins against it, setup your eclipse classpath. Lucene is different, but it has no patched JARs (it does not depend on any external JAR at all). The Xerces JAR in benchmark is also only a "utility" package, that should not be in maven.
To come back to the example:
Jetty should not be a problem, because Solr will upgrade to Jetty 8 (breaking lots of backwards). For the Jetty 6 series there is no support at all, so there will not be a new release. Our fixed Jetty 6 is just a plain jetty with one broken unicode method patched. If somebody accidently have it in his classpath, he will be happy that Chinese, Japanese, and Mormon UTF8 works correct...
For the other JARs that have no other relations except to Solr, JARJARing them is no issue at all. It would be what I would recommend. This affects commons JARs like commons-csv, also noggit,
The ICU example is just a minor "example", we will not need to patch ICU, because they are releasing quite often.
If you have reflection dependecies or dependencies from other JARs in you classpath, you can still install the original unpatched version and you are happy. They will not conflict.

Not close enough to satisfy Solr's pace of releases I think. Any help would be appreciated, the API has been reworked recently to take advantage of Java 5 features. A feedback on how it integrates with Solr would be welcome.

Emmanuel Bourg
added a comment - 06/Mar/12 16:54 Not close enough to satisfy Solr's pace of releases I think. Any help would be appreciated, the API has been reworked recently to take advantage of Java 5 features. A feedback on how it integrates with Solr would be welcome.

I propose we jarjar only the currently used commons-csv snapshot, and leave all other jars as-is. (Uwe, you mentioned noggit, but it has never been released on its own, so I don't think it's necessary to jarjar it.)

A feedback on how it integrates with Solr would be welcome.

The 3.6 release is going to happen soon; it seems unwise to upgrade on branch_3x at this point.

On trunk, however, upgrading the commons-csv snapshot to a more recent revision is a good idea, to prepare for the (eventual) release.

Steve Rowe
added a comment - 06/Mar/12 17:11 I propose we jarjar only the currently used commons-csv snapshot, and leave all other jars as-is. (Uwe, you mentioned noggit, but it has never been released on its own, so I don't think it's necessary to jarjar it.)
A feedback on how it integrates with Solr would be welcome.
The 3.6 release is going to happen soon; it seems unwise to upgrade on branch_3x at this point.
On trunk, however, upgrading the commons-csv snapshot to a more recent revision is a good idea, to prepare for the (eventual) release.

Uwe, no need to even publish those bugfix jars. I would just inline them into the Solr artifact. (Think ueberjar) Then you don't even have to change a line of java source code nor the bugfix jars. Just the build.

Torsten, in case you're wondering why nobody is answering you: Lucene/Solr publishes Maven artifacts that are identical to the released Ant-built binary jars, so your proposal would require increasing the size of our released jars, solely to solve a Maven dependency problem. Changing the Ant build simply to solve Maven problems has historically been a non-starter.

Steve Rowe
added a comment - 06/Mar/12 17:17 Uwe, no need to even publish those bugfix jars. I would just inline them into the Solr artifact. (Think ueberjar) Then you don't even have to change a line of java source code nor the bugfix jars. Just the build.
Torsten, in case you're wondering why nobody is answering you: Lucene/Solr publishes Maven artifacts that are identical to the released Ant-built binary jars, so your proposal would require increasing the size of our released jars, solely to solve a Maven dependency problem. Changing the Ant build simply to solve Maven problems has historically been a non-starter.

This is not possible Ryan. But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6 and stop publishing new artifacts to Maven central besides the solr/lucene specific ones. This will stop the proliferation of conflicting artifacts in the Maven repository, but it won't fix the classpath issue for projects using Solr and Commons CSV.

Emmanuel Bourg
added a comment - 06/Mar/12 17:22 This is not possible Ryan. But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6 and stop publishing new artifacts to Maven central besides the solr/lucene specific ones. This will stop the proliferation of conflicting artifacts in the Maven repository, but it won't fix the classpath issue for projects using Solr and Commons CSV.

Since we don't really deploy with maven, can we just point the maven artifact to the snapshot build?

I don't know the answer, but I object to going down this road. There is a reason for not depending on snapshots: they change, often incompatibly. Having internal snapshots isn't just a sneaky workaround - it fixes the underlying problem (potentially incompatible changes), while also, of course, creating a different problem...

We have an example in the specific case of commons-csv: as Emmanuel said, the API has changed (very likely incompatibly) since our internal snapshot was taken.

Steve Rowe
added a comment - 06/Mar/12 17:26 Since we don't really deploy with maven, can we just point the maven artifact to the snapshot build?
I don't know the answer, but I object to going down this road. There is a reason for not depending on snapshots: they change, often incompatibly. Having internal snapshots isn't just a sneaky workaround - it fixes the underlying problem (potentially incompatible changes), while also, of course, creating a different problem...
We have an example in the specific case of commons-csv: as Emmanuel said, the API has changed (very likely incompatibly) since our internal snapshot was taken.

Steve Rowe
added a comment - 06/Mar/12 17:32 But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6
I think this is a good idea, and would require very little effort. I'll make a patch.

Torsten, in case you're wondering why nobody is answering you: Lucene/Solr publishes Maven artifacts that are identical to the released Ant-built binary jars, so your proposal would require increasing the size of our released jars, solely to solve a Maven dependency problem. Changing the Ant build simply to solve Maven problems has historically been a non-starter.

Steven, this will not increase the global size of the distribution, it might even reduce it. You are trading a core Solr jar with a few modified dependent jars, all downloaded from the central Maven repository, for a unique jar containing Solr and it dependencies, and possibly only the necessary classes. For example Solr doesn't use the writer package of Commons CSV, it can be automatically removed in the process to reduce the size of the dependencies.

Emmanuel Bourg
added a comment - 06/Mar/12 17:34
Torsten, in case you're wondering why nobody is answering you: Lucene/Solr publishes Maven artifacts that are identical to the released Ant-built binary jars, so your proposal would require increasing the size of our released jars, solely to solve a Maven dependency problem. Changing the Ant build simply to solve Maven problems has historically been a non-starter.
Steven, this will not increase the global size of the distribution, it might even reduce it. You are trading a core Solr jar with a few modified dependent jars, all downloaded from the central Maven repository, for a unique jar containing Solr and it dependencies, and possibly only the necessary classes. For example Solr doesn't use the writer package of Commons CSV, it can be automatically removed in the process to reduce the size of the dependencies.
Still not interested?

Steven, you created this Maven problem by releasing conflicting artifacts to the central repository.

In fact, I am personally responsible for putting the conflicting artifacts in the central repository. It is also largely as a result of my efforts that Lucene/Solr continues to publish Maven artifacts at all.

Refusing to fix it is irresponsible.

Only if one assumes that refusing to implement your proposed fix is equivalent to refusing to do anything? I suspect we disagree on this point.

Steve Rowe
added a comment - 06/Mar/12 18:18 Steven, you created this Maven problem by releasing conflicting artifacts to the central repository.
In fact, I am personally responsible for putting the conflicting artifacts in the central repository. It is also largely as a result of my efforts that Lucene/Solr continues to publish Maven artifacts at all.
Refusing to fix it is irresponsible.
Only if one assumes that refusing to implement your proposed fix is equivalent to refusing to do anything? I suspect we disagree on this point.

Lets keep our heads cool here. To be clear, Steven wants to help and is doing what he can. We need to figure a way to help within the limits of the lucene community. Previous discussion in this thread should give you a sense of the animosity towards maven – the question is how to move forward within these limits.

The obvious best solution is to point to an official commons-csv release.

Short of that, we need to make sure we don't make the problem worse; We can do that by releasing 3.6 pointing to the 3.5 solr-commons-csv package.

Ryan McKinley
added a comment - 06/Mar/12 18:40 Steven, you created this Maven problem
Lets keep our heads cool here. To be clear, Steven wants to help and is doing what he can. We need to figure a way to help within the limits of the lucene community. Previous discussion in this thread should give you a sense of the animosity towards maven – the question is how to move forward within these limits.
The obvious best solution is to point to an official commons-csv release.
Short of that, we need to make sure we don't make the problem worse; We can do that by releasing 3.6 pointing to the 3.5 solr-commons-csv package.

Robert Muir
added a comment - 06/Mar/12 19:08
Previous discussion in this thread should give you a sense of the animosity towards maven – the question is how to move forward within these limits.
Its not unjustified animosity towards maven: this issue is solely maven

Steven I wasn't referring to you explicitly, but to the Solr community. I didn't know you actually deployed the artifacts. If you dislike Maven you'll be pleased to learn that it was a great blow to the Maven integrity

I'm glad to hear you are willing to do something, what's your plan? Let me recap the options currently identified:

Embed the Commons CSV source files in the core of Solr. That was the purpose of my patch, but it doesn't solve the issue for the other dependencies (carrot, noggit, uima, jsonic, etc)

Embed the pacthed dependencies in the main Solr jar and rename the packages with jarjar or a similar tool. The additional benefit it a reduction of the global distribution size by keeping only the classes actually used by Solr.

Keep publishing Solr specific dependencies but after renaming the packages with JarJar

Import the source of the dependencies and rename the package with the Ant replace task (Tomcat solution)

Stop releasing Maven artifacts completely, even Solr core.

Pointing to the 3.5 artifacts is a temporary solution, because:

the classpath conflicts still exists for projects importing solr-core

the next time Solr needs a patched unreleased dependency it will be problematic

Emmanuel Bourg
added a comment - 06/Mar/12 19:27 - edited Steven I wasn't referring to you explicitly, but to the Solr community. I didn't know you actually deployed the artifacts. If you dislike Maven you'll be pleased to learn that it was a great blow to the Maven integrity
I'm glad to hear you are willing to do something, what's your plan? Let me recap the options currently identified:
Embed the Commons CSV source files in the core of Solr. That was the purpose of my patch, but it doesn't solve the issue for the other dependencies (carrot, noggit, uima, jsonic, etc)
Embed the pacthed dependencies in the main Solr jar and rename the packages with jarjar or a similar tool. The additional benefit it a reduction of the global distribution size by keeping only the classes actually used by Solr.
Keep publishing Solr specific dependencies but after renaming the packages with JarJar
Import the source of the dependencies and rename the package with the Ant replace task (Tomcat solution)
Stop releasing Maven artifacts completely, even Solr core.
Pointing to the 3.5 artifacts is a temporary solution, because:
the classpath conflicts still exists for projects importing solr-core
the next time Solr needs a patched unreleased dependency it will be problematic

Without all of Steven's hard work struggling with Maven, while working
within the constraints of the other Lucene/Solr committers (who
generally, strongly, dislike the impositions Maven places on a
project's development, case in point), there would be no Maven
artifacts for Lucene nor Solr.

Seriously, Steven is the ONLY reason why Lucene and Solr have, and
continue to have, Maven artifacts.

Michael McCandless
added a comment - 06/Mar/12 20:23 To be clear, Steven wants to help and is doing what he can.
I would make this statement far stronger:
Without all of Steven's hard work struggling with Maven, while working
within the constraints of the other Lucene/Solr committers (who
generally, strongly, dislike the impositions Maven places on a
project's development, case in point), there would be no Maven
artifacts for Lucene nor Solr.
Seriously, Steven is the ONLY reason why Lucene and Solr have, and
continue to have, Maven artifacts.
Thank you Steven!!

Steve Rowe
added a comment - 06/Mar/12 23:06 Pointing to the 3.5 artifacts is a temporary solution
I agree. I also agree with Ryan: " [W] e need to make sure we don't make the problem worse; We can do that by releasing 3.6 pointing to the 3.5 solr-commons-csv package."
If you dislike Maven you'll be pleased to learn that it was a great blow to the Maven integrity
I actually like Maven for some things.
"Great blow"? I sincerely doubt that this is the first time something like this has happened.
"Maven integrity"? Oxymoronic much?

2. Embed the pacthed dependencies in the main Solr jar and rename the packages with jarjar or a similar tool. The additional benefit it a reduction of the global distribution size by keeping only the classes actually used by Solr.

-1. Again, too intrusive.

3. Keep publishing Solr specific dependencies but after renaming the packages with JarJar

I think this is a possibility.

4. Import the source of the dependencies and rename the package with the Ant replace task (Tomcat solution)

-0. The project would require a new module for each dependency. Seems too heavy to me.

Steve Rowe
added a comment - 06/Mar/12 23:46 I'm glad to hear you are willing to do something, what's your plan?
I don't have one yet. What's your plan for releasing commons-csv?
Let me recap the options currently identified:
1. Embed the Commons CSV source files in the core of Solr. That was the purpose of my patch, but it doesn't solve the issue for the other dependencies (carrot, noggit, uima, jsonic, etc)
-1. I think this is too intrusive. Maven's requirement cannot dictate Lucene/Solr Ant build structure.
2. Embed the pacthed dependencies in the main Solr jar and rename the packages with jarjar or a similar tool. The additional benefit it a reduction of the global distribution size by keeping only the classes actually used by Solr.
-1. Again, too intrusive.
3. Keep publishing Solr specific dependencies but after renaming the packages with JarJar
I think this is a possibility.
4. Import the source of the dependencies and rename the package with the Ant replace task (Tomcat solution)
-0. The project would require a new module for each dependency. Seems too heavy to me.
5. Stop releasing Maven artifacts completely, even Solr core.
Fantastic! Thanks for your input! You're, like, awesome!

Steve Rowe
added a comment - 07/Mar/12 00:06 Emmanuel, you neglected to include transitive exclusion in your "identified options":
6. Ryan suggested excluding solr-commons-csv from solr-core's transitive dependencies: https://issues.apache.org/jira/browse/SOLR-3204?focusedCommentId=13222930&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13222930
This puts the burden on users, who won't know it's necessary until after they run into classpath trouble. We should figure out something better.

Not knowing anything about maven, but just comparing this to the releases that solr users download, the main
difference seems to be that in general the 'artifact' for an ordinary user is a .war file, a server-side
webapp that a user places into their servlet container... maybe the maven artifact should be consistent with that?

Wouldn't this avoid exposing any internal implementation details of whats actually inside that .war file,
such as any external dependencies?

Robert Muir
added a comment - 07/Mar/12 00:37
We should figure out something better.
Not knowing anything about maven, but just comparing this to the releases that solr users download, the main
difference seems to be that in general the 'artifact' for an ordinary user is a .war file, a server-side
webapp that a user places into their servlet container... maybe the maven artifact should be consistent with that?
Wouldn't this avoid exposing any internal implementation details of whats actually inside that .war file,
such as any external dependencies?

@Steven Yes that's a possible workaround for Solr users, as long as they don't touch directly or indirectly the CSV related classes. But that solution doesn't spare a necessary modification of the Solr build with regard to patched dependencies.

Emmanuel Bourg
added a comment - 07/Mar/12 00:40 - edited @Steven Yes that's a possible workaround for Solr users, as long as they don't touch directly or indirectly the CSV related classes. But that solution doesn't spare a necessary modification of the Solr build with regard to patched dependencies.

Solr is a end-user product, so I have no idea why it is in maven at all.

the 'artifact' for an ordinary user is a .war file, a server-side webapp that a user places into their servlet container... maybe the maven artifact should be consistent with that? Wouldn't this avoid exposing any internal implementation details of whats actually inside that .war file, such as any external dependencies?

Solr can be used in embedded fashion (though I don't have any experience with that).

Steve Rowe
added a comment - 07/Mar/12 00:44 - edited Solr is a end-user product, so I have no idea why it is in maven at all.
the 'artifact' for an ordinary user is a .war file, a server-side webapp that a user places into their servlet container... maybe the maven artifact should be consistent with that? Wouldn't this avoid exposing any internal implementation details of whats actually inside that .war file, such as any external dependencies?
Solr can be used in embedded fashion (though I don't have any experience with that).
Also, solrj is a client jar.
edit : oops, solrj isn't relevant here - we're talking about solr-core.

The real problem (regardless of ant/maven) is for users that want to use o.a.c.csv classes that are not the ones in the solr distribution. This can happen when people have plugins using something based on a recent CSV build – something will break, either their code, or the CSVResponseWriter

This is not a maven issue, it is a question of who should put classes into the org.apache.commons.csv namespace. I understand why the csv folks are pissed, and think it is fair we look for ways to fix it.

I am now +1 on Emmanuel's patch to embed o.a.solr.csv in the solr core jar file – we should have notes that it won't be supported once a real csv is released.

Ryan McKinley
added a comment - 07/Mar/12 00:56 The real problem (regardless of ant/maven) is for users that want to use o.a.c.csv classes that are not the ones in the solr distribution. This can happen when people have plugins using something based on a recent CSV build – something will break, either their code, or the CSVResponseWriter
This is not a maven issue, it is a question of who should put classes into the org.apache.commons.csv namespace. I understand why the csv folks are pissed, and think it is fair we look for ways to fix it.
I am now +1 on Emmanuel's patch to embed o.a.solr.csv in the solr core jar file – we should have notes that it won't be supported once a real csv is released.

Robert Muir
added a comment - 07/Mar/12 01:53
This is not a maven issue, it is a question of who should put classes into the org.apache.commons.csv namespace.
This is a 100% maven and only maven issue.
For the ant build we are just using a snapshot jar in solr/lib, not 'releasing' nor 'putting classes' anywhere.
Its an implementation detail of solr. Its simply a dependency.

Sorry, by 'put classes', i mean that when commons-csv-1.0-SNAPSHOT-r966014.jar is in the runtime classpath it uses classes in the namespace org.apache.commons.csv

By using this namespace, it prevents people from using other dependencies with that same package name reliably. The commons-csv folks are understandably upset that that means their jar file does not work as expected within solr.

Consider the case where someone deploys the solr.war file (built by ant) with a plugin that uses the current commons-csv.jar from /trunk. something will break – either the CSVRequestHandler or their plugin depending on what is first in the classpath. (I think their plugin would always fail since it is loaded second)

I think we have gotten lost in the maven discussion, when the real problem is about how our CSV.jar interacts with plugins (or anything else in the runtime classpath)

Ryan McKinley
added a comment - 07/Mar/12 02:31 Sorry, by 'put classes', i mean that when commons-csv-1.0-SNAPSHOT-r966014.jar is in the runtime classpath it uses classes in the namespace org.apache.commons.csv
By using this namespace, it prevents people from using other dependencies with that same package name reliably. The commons-csv folks are understandably upset that that means their jar file does not work as expected within solr.
Consider the case where someone deploys the solr.war file (built by ant) with a plugin that uses the current commons-csv.jar from /trunk. something will break – either the CSVRequestHandler or their plugin depending on what is first in the classpath. (I think their plugin would always fail since it is loaded second)
I think we have gotten lost in the maven discussion, when the real problem is about how our CSV.jar interacts with plugins (or anything else in the runtime classpath)

This issue isnt about plugins, having commons-csv in the classpath only affect users of solr.

Please don't try to spin the issue, this issue is all about maven and only about maven:

the very title of this jira issue 'solr-commons-csv' is a maven-only concept

as stated by Emmanuel in his first response back 'Even if you consider solr-commons-csv to be an internal part of Solr it's now a public Maven artifact that people are starting to use in their projects'

Robert Muir
added a comment - 07/Mar/12 02:43 This issue isnt about plugins, having commons-csv in the classpath only affect users of solr .
Please don't try to spin the issue, this issue is all about maven and only about maven:
the very title of this jira issue 'solr-commons-csv' is a maven-only concept
as stated by Emmanuel in his first response back 'Even if you consider solr-commons-csv to be an internal part of Solr it's now a public Maven artifact that people are starting to use in their projects'
So thats the only issue, maven artifacts.

Ryan McKinley
added a comment - 07/Mar/12 05:08 I'm not trying to spin anything – I'm trying to figure out what the real issue is. Part of it is obviously the maven release. Emmanuel can you tell us more about how this came up?

Your analysis is correct Ryan. Anyone building upon the Solr code is affected by this issue. The fact the offending artifacts are in the central Maven repository only makes the problem worse,

I discovered this issue because someone who imported solr-commons-csv in its project reported an issue (SANDBOX-330), an infinite loop with trailing comments, that was already fixed on the trunk. I didn't know about the solr-commons-csv artifact in the central Maven repository until then. So I inspected the jar and gasped in awe at the unmodified namespace.

I understand that Maven as a build tool is controversial, I'm myself an heavy Ant user on several projects requiring complex build setups. But Maven, as a way to share code through a central repository, is also a very valuable tool. It's adopted by alternative build tools like Ivy and Gradle. It's also used by IDEs to import dependencies automatically and to be able to browse their source and Javadoc easily. It's difficult to understand why this part is also hated by Maven detractors. The constraints on the central repository aren't that unbearable, the most important rule to follow is to not publish two artifacts identified by distinct groupId:artifactId with conflicting classes. The corollary is that a project A should not publish classes in the namespace of a project B.

Emmanuel Bourg
added a comment - 07/Mar/12 09:22 Your analysis is correct Ryan. Anyone building upon the Solr code is affected by this issue. The fact the offending artifacts are in the central Maven repository only makes the problem worse,
I discovered this issue because someone who imported solr-commons-csv in its project reported an issue ( SANDBOX-330 ), an infinite loop with trailing comments, that was already fixed on the trunk. I didn't know about the solr-commons-csv artifact in the central Maven repository until then. So I inspected the jar and gasped in awe at the unmodified namespace.
I understand that Maven as a build tool is controversial, I'm myself an heavy Ant user on several projects requiring complex build setups. But Maven, as a way to share code through a central repository, is also a very valuable tool. It's adopted by alternative build tools like Ivy and Gradle. It's also used by IDEs to import dependencies automatically and to be able to browse their source and Javadoc easily. It's difficult to understand why this part is also hated by Maven detractors. The constraints on the central repository aren't that unbearable, the most important rule to follow is to not publish two artifacts identified by distinct groupId:artifactId with conflicting classes. The corollary is that a project A should not publish classes in the namespace of a project B.

The arguments that this is more than a maven issue are bogus.
Do you take me for a fool?

The same arguments could be had for even released dependencies like httpclient-3.0 that we use. What if a user wants to use 4.0?

To ant, whether its released or unreleased, its just a jar in a /lib and in the users classpath.
But thats not the real issue here: These people are angry about maven repositories.

So quit screwing around with crazy ideas about duplicating whole codebases and fix the real problem:

open a bug report at Maven with the Description:

Allow maven project A to depend upon project B without B being in maven at all

Thats the bug, it stops the replication of the virus, and i don't see any reason why maven wouldnt want to fix it? If they object, then it only proves my point that its really the GPLv3 of build tools.

Robert Muir
added a comment - 07/Mar/12 11:09 The arguments that this is more than a maven issue are bogus.
Do you take me for a fool?
The same arguments could be had for even released dependencies like httpclient-3.0 that we use. What if a user wants to use 4.0?
To ant, whether its released or unreleased, its just a jar in a /lib and in the users classpath.
But thats not the real issue here: These people are angry about maven repositories.
So quit screwing around with crazy ideas about duplicating whole codebases and fix the real problem:
open a bug report at Maven with the Description:
Allow maven project A to depend upon project B without B being in maven at all
Thats the bug, it stops the replication of the virus, and i don't see any reason why maven wouldnt want to fix it? If they object, then it only proves my point that its really the GPLv3 of build tools.

The short answer is that the Lucene PMC is releasing artifacts (to the maven repository) in the org.apache.commons namespace.

It doesn't matter how the release is done. You're doing it. You have released Commons code to the public. Period. Stop pointing fingers at tools. The Lucene PMC has released code in another project's namespace, which is incompatible with the direction that project wants to take the namespace. The Lucene PMC has no right to usurp the direction of other projects here at the ASF.

If you cannot get the tool to avoid releasing/interfering with other projects' code, then stop using the tool. Seriously.

The Lucene PMC's releases should stop interfering with other projects' namespaces. Find a way to fix it. Stop blaming tools. Start fixing.

Greg Stein
added a comment - 07/Mar/12 11:24 Maven is a red herring.
The short answer is that the Lucene PMC is releasing artifacts (to the maven repository) in the org.apache.commons namespace.
It doesn't matter how the release is done. You're doing it. You have released Commons code to the public. Period. Stop pointing fingers at tools. The Lucene PMC has released code in another project's namespace, which is incompatible with the direction that project wants to take the namespace. The Lucene PMC has no right to usurp the direction of other projects here at the ASF.
If you cannot get the tool to avoid releasing/interfering with other projects' code, then stop using the tool. Seriously.
The Lucene PMC's releases should stop interfering with other projects' namespaces. Find a way to fix it. Stop blaming tools. Start fixing.

I guess I'm missing something. We aren't part of the Commons project. The Commons code is Apache licensed, we are using it as we see fit under the T&C of the ASL. What prevents someone else from going and doing the same thing outside of Apache? In fact, it happens all the time, even at Apache. Commons CLI 2 is the biggest one, as I know of several projects that "release" that b/c Commons itself never has. Here are also several who do it for Lucene. The problem is Maven and it isn't going away.

Grant Ingersoll
added a comment - 07/Mar/12 13:18 - edited I guess I'm missing something. We aren't part of the Commons project. The Commons code is Apache licensed, we are using it as we see fit under the T&C of the ASL. What prevents someone else from going and doing the same thing outside of Apache? In fact, it happens all the time, even at Apache. Commons CLI 2 is the biggest one, as I know of several projects that "release" that b/c Commons itself never has. Here are also several who do it for Lucene. The problem is Maven and it isn't going away.
Here are some examples:
http://search.maven.org/#artifactdetails%7Corg.hibernate.apache.lucene.solr%7Capache-solr-analyzer%7C1.2.0%7Cjar
http://search.maven.org/#artifactdetails%7Corg.apache.clerezza.ext%7Corg.apache.lucene%7C0.1-incubating%7Cbundle
Tomcat does the same thing to Commons (file upload, for example):
http://search.maven.org/#search%7Cga%7C4%7Ccommons
I could keep going. Just do searches for popular Apache libraries in Maven and you will see what I mean. Maven is not a red herring. It is the problem.

Emmanuel Bourg
added a comment - 07/Mar/12 13:33 - edited That's not correct for Tomcat, the Commons components in Tomcat have a different namespace to avoid classpath conflicts.
Clerezza releases will have to be changed too.
Hibernate probably made a mistake in 2008, they haven't released a solr artifact since then, so I guess they fixed the issue.
EDIT: Indeed Hibernate fixed the dependency in Hibernate Search 3.3.0 Beta2, the dependency was introduced with the Beta 1

More that do it:
Geronimo (Apache) (Digester, Discovery)
JVNet-Hudson (commons-jelly)
com.kenai.nbpwr
ServiceMix (commons-csv)
Apache Directory does this weird thing where they package the Commons jars inside of another jar. Not sure what that does, but the nested jar is Commons. Not quite sure how the classloader would deal with that.
Mahout does it for Commons-CLI2 (b/c commons never seems to want to release it)

And that is just me searching for commons.

BTW, I don't actually think the problem is Maven. The problem is Java's classloader is broken. The way to fix it is to use a classloader that scopes dependencies properly. Otherwise, this is simply a known problem with Java and the solution is that one has to be careful about what versions of things one imports. It's exactly the same issue as when one has two dependencies that each have dependencies on different versions of commons-lang or Lucene or whats-it-called.jar.

In other words, I don't think this is worth us doing anything different than what we are already doing. Except, for seeing if we can upgrade our version of Commons CSV to an official release as a courtesy.

Grant Ingersoll
added a comment - 07/Mar/12 13:44 - edited More that do it:
Geronimo (Apache) (Digester, Discovery)
JVNet-Hudson (commons-jelly)
com.kenai.nbpwr
ServiceMix (commons-csv)
Apache Directory does this weird thing where they package the Commons jars inside of another jar. Not sure what that does, but the nested jar is Commons. Not quite sure how the classloader would deal with that.
Mahout does it for Commons-CLI2 (b/c commons never seems to want to release it)
And that is just me searching for commons.
BTW, I don't actually think the problem is Maven. The problem is Java's classloader is broken. The way to fix it is to use a classloader that scopes dependencies properly. Otherwise, this is simply a known problem with Java and the solution is that one has to be careful about what versions of things one imports. It's exactly the same issue as when one has two dependencies that each have dependencies on different versions of commons-lang or Lucene or whats-it-called.jar.
In other words, I don't think this is worth us doing anything different than what we are already doing. Except, for seeing if we can upgrade our version of Commons CSV to an official release as a courtesy.

Issue 1: a private, internal dependency in Solr on commons-csv (and
others) has become "publicly released" in Maven as solr-commons-csv.
I know very little about Maven, but this does sound bad. That said, I
don't think Maven has any right to "peek in" and force Solr's private
dependencies to become public; what deps Solr uses is an
implementation detail of Solr. Short of removing Solr from Maven, is
there some way to publish Solr's WAR through Maven while keeping (some
of?) its dependencies private?

Issue 2: JAR hell (runtime CLASSPATH pollution): this is an age old,
widespread issue, completely orthogonal to Maven. Regardless of how a
project builds its artifacts, if it has external dependencies, it will
of course "pollute" the CLASSPATH. This is a well known problem.

Really, we should open a new issue for issue 2: it has nothing to do
with Maven and is long-standing (our ant-produced artifacts have had
the issue forever). In practice, this has never seemed to be a
problem to users, presumably because users don't usually embed Solr.
Normally users deploy the WAR and use as-is, except for plugins. If
we really think this needs to be fixed (I'm doubtful), there are
various options... eg OSGI seems to "solve" this, but adds complexity.
Jarjar, minijar, ant replace, are also options, but I don't like the
risk of hard-to-explain bugs. We can discuss on a separate issue...

Michael McCandless
added a comment - 07/Mar/12 13:46 There are two very different issues mixed in here:
Issue 1: a private, internal dependency in Solr on commons-csv (and
others) has become "publicly released" in Maven as solr-commons-csv.
I know very little about Maven, but this does sound bad. That said, I
don't think Maven has any right to "peek in" and force Solr's private
dependencies to become public; what deps Solr uses is an
implementation detail of Solr. Short of removing Solr from Maven, is
there some way to publish Solr's WAR through Maven while keeping (some
of?) its dependencies private?
Issue 2: JAR hell (runtime CLASSPATH pollution): this is an age old,
widespread issue, completely orthogonal to Maven. Regardless of how a
project builds its artifacts, if it has external dependencies, it will
of course "pollute" the CLASSPATH. This is a well known problem.
Really, we should open a new issue for issue 2: it has nothing to do
with Maven and is long-standing (our ant-produced artifacts have had
the issue forever). In practice, this has never seemed to be a
problem to users, presumably because users don't usually embed Solr.
Normally users deploy the WAR and use as-is, except for plugins. If
we really think this needs to be fixed (I'm doubtful), there are
various options... eg OSGI seems to "solve" this, but adds complexity.
Jarjar, minijar, ant replace, are also options, but I don't like the
risk of hard-to-explain bugs. We can discuss on a separate issue...

There is no issue for users of the WAR, I think, which will generally be external to a project, and handles all of these bundling issues internally.

The issue arises in projects that depend on the various solr jars that are bundled in its WAR. Using Maven, it's as if you exploded the WAR, pulled all its jars into your project, and then added your own code too. As an example, we've done this in a project where I work so we can run EmbeddedSolrServer in our tests. I have it on my TODO list to break this dependency though since it's overkill to include all of solr in our project for this convenience.

Mike Sokolov
added a comment - 07/Mar/12 14:27 - edited There is no issue for users of the WAR, I think, which will generally be external to a project, and handles all of these bundling issues internally.
The issue arises in projects that depend on the various solr jars that are bundled in its WAR. Using Maven, it's as if you exploded the WAR, pulled all its jars into your project, and then added your own code too. As an example, we've done this in a project where I work so we can run EmbeddedSolrServer in our tests. I have it on my TODO list to break this dependency though since it's overkill to include all of solr in our project for this convenience.

As noted before, we cannot fix this for Jetty 6 by package-renaming using JarJar, as this would break almost everything in the jetty configuration (Jetty is JarJar unfriendly), but Jetty can be easily solved on our side by upgrading to Jetty 8 (also for Solr 3.6!!!!). The patch/branch is ready, just needs committing. We can then ship with unmodified Jetty 8, as the Unicode bugs should be fixed.

For the other projects I see no real problem in using jarjar (which is used in lots of projects): Lets simply investigate what foreign namespaces we use in our packages (e.g. we should also rename the Snowball classes, because we heavily forked them and they are still in org.tartarus namespace - as we forked that project, we should do it in the source code). For e.g. commons-csv we have no problem jarjaring our fork and re-release it. In the jarjar issue tracker is also a discussion about “hiding” renamed classes from GUIs by prefixing all renamed class names with $, so they are invisible for an autocomplete-using Eclipse developer, so nobody outside can use the jarjared classes outside Lucene/Solr.

About the remaining issues Robert mentioned: They are all solvable. JarJar renames also class names inside strings used for reflection, it can also rewrite Manifests and META-INF (we use e.g. for codecs, or Apache Xerces uses to plug in the parser impl). I just would like to give it a try, it cannot happen more than our testcases do not run. The example Robert has given with external references to icu4j is exactly what we want to prevent with JarJaring, so it doesn’t apply in my opinion. If we patch and modify icu4j and ship with a package-renamed version, code that uses reflection outside the Solr/Lucene JAR is completely unaffected. If the external project side-by-side with Lucene/Solr wants to use icu4j it can simply include the original in any version, solr with its patched version is completely unaffected and vice versa.

So I would strongly suggest to jarjar all dependencies that we modified locally (means all foreign jar-files with solr-prefix). How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.

Uwe Schindler
added a comment - 07/Mar/12 14:46 As noted before, we cannot fix this for Jetty 6 by package-renaming using JarJar, as this would break almost everything in the jetty configuration (Jetty is JarJar unfriendly), but Jetty can be easily solved on our side by upgrading to Jetty 8 (also for Solr 3.6!!!!). The patch/branch is ready, just needs committing. We can then ship with unmodified Jetty 8, as the Unicode bugs should be fixed.
For the other projects I see no real problem in using jarjar (which is used in lots of projects): Lets simply investigate what foreign namespaces we use in our packages (e.g. we should also rename the Snowball classes, because we heavily forked them and they are still in org.tartarus namespace - as we forked that project, we should do it in the source code). For e.g. commons-csv we have no problem jarjaring our fork and re-release it. In the jarjar issue tracker is also a discussion about “hiding” renamed classes from GUIs by prefixing all renamed class names with $, so they are invisible for an autocomplete-using Eclipse developer, so nobody outside can use the jarjared classes outside Lucene/Solr.
About the remaining issues Robert mentioned: They are all solvable. JarJar renames also class names inside strings used for reflection, it can also rewrite Manifests and META-INF (we use e.g. for codecs, or Apache Xerces uses to plug in the parser impl). I just would like to give it a try, it cannot happen more than our testcases do not run. The example Robert has given with external references to icu4j is exactly what we want to prevent with JarJaring, so it doesn’t apply in my opinion. If we patch and modify icu4j and ship with a package-renamed version, code that uses reflection outside the Solr/Lucene JAR is completely unaffected. If the external project side-by-side with Lucene/Solr wants to use icu4j it can simply include the original in any version, solr with its patched version is completely unaffected and vice versa.
So I would strongly suggest to jarjar all dependencies that we modified locally (means all foreign jar-files with solr-prefix). How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.

Robert Muir
added a comment - 07/Mar/12 14:55 How would this work for e.g. the carrot2 example? The carrot2 class names are referenced in user configuration files (solrconfig.xml):
For example:
<lst name="engine">
<!-- The name, only one can be named "default" -->
<str name="name">default</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
</lst>

Robert Muir
added a comment - 07/Mar/12 14:58 Also the UIMA integration has the same issue, configuration files contain their qualified class names, for example:
<type allAnnotatorFeatures="true">org.apache.uima.SentenceAnnotation</type>

How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.

It's preferable to post process the Solr jar rather than renaming the packages imported in the source code. That's less intrusive and let you upgrade to the official fixed dependency more easily.

Emmanuel Bourg
added a comment - 07/Mar/12 15:05
How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.
It's preferable to post process the Solr jar rather than renaming the packages imported in the source code. That's less intrusive and let you upgrade to the official fixed dependency more easily.

As Mike and Yonik mentioned - most of us don't know much about maven. I still wish it was a downstream issue. I still wish we would have voted on supporting it. With all due respect to Steve and his great work around it. I really wish it was not our problem - as I know many others do.

What was the issue with just using a released version of csv? This issue is so long already, its hard to find it.

And -1 for continuing Maven support within the project for what its worth (Though I love your work and dedication to this Steve )

Mark Miller
added a comment - 07/Mar/12 15:19 As Mike and Yonik mentioned - most of us don't know much about maven. I still wish it was a downstream issue. I still wish we would have voted on supporting it. With all due respect to Steve and his great work around it. I really wish it was not our problem - as I know many others do.
What was the issue with just using a released version of csv? This issue is so long already, its hard to find it.
And -1 for continuing Maven support within the project for what its worth (Though I love your work and dedication to this Steve )

What was the issue with just using a released version of csv? This issue is so long already, its hard to find it.

I think the major issue is that there is not yet a released version. I think we would definitely prefer that.

Speaking from some experience: In my opinion CSV is deceptively complex and is a perfect example of where we
would rather 'downstream' that knowledge to someone who focuses on it and gives good performance.
Rolling our own is dangerous.

Robert Muir
added a comment - 07/Mar/12 15:23
What was the issue with just using a released version of csv? This issue is so long already, its hard to find it.
I think the major issue is that there is not yet a released version. I think we would definitely prefer that.
Speaking from some experience: In my opinion CSV is deceptively complex and is a perfect example of where we
would rather 'downstream' that knowledge to someone who focuses on it and gives good performance.
Rolling our own is dangerous.
But there is not yet any official releases to use...

How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.

It's preferable to post process the Solr jar rather than renaming the packages imported in the source code. That's less intrusive and let you upgrade to the official fixed dependency more easily.

That's our committers decision. The fact here is, that Solr or even Lucene is not just a WAR file, its consisting of lot's of unrelated JAR artifacts and all of them should live on its own, so a user can remove features he does not need. So I prefer to "transform" those "changed" artifacts separately to private "namespace" and simply change maybe 3 import statements (I can even use "sed" for that).

Uwe Schindler
added a comment - 07/Mar/12 15:28 - edited
How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.
It's preferable to post process the Solr jar rather than renaming the packages imported in the source code. That's less intrusive and let you upgrade to the official fixed dependency more easily.
That's our committers decision. The fact here is, that Solr or even Lucene is not just a WAR file, its consisting of lot's of unrelated JAR artifacts and all of them should live on its own, so a user can remove features he does not need. So I prefer to "transform" those "changed" artifacts separately to private "namespace" and simply change maybe 3 import statements (I can even use "sed" for that).

It's 5 classes it looks like - and it's Apache licensed and there is no release - can't we simply suck this into our code base with a readme and JIRA about removing it and switching to a release when one occurs? Get the whole dependency thing right out of Maven's claws.

Mark Miller
added a comment - 07/Mar/12 15:36 But there is not yet any official releases to use...
Ah, interesting. Perhaps we could get the favor of a release
It's 5 classes it looks like - and it's Apache licensed and there is no release - can't we simply suck this into our code base with a readme and JIRA about removing it and switching to a release when one occurs? Get the whole dependency thing right out of Maven's claws.

It's 5 classes it looks like - and it's Apache licensed and there is no release - can't we simply suck this into our code base with a readme and JIRA about removing it and switching to a release when one occurs? Get the whole dependency thing right out of Maven's claws.

Yes while this is the case, and as Emmanuel states, the purpose of his patch, its not a scalable solution in general.
Its also expensive to fork software... there is always the danger that we then become out-of-sync and never sync back up.

(feel free to hit me, for Uwe's example of snowball, which is a perfect example)

What about the other cases that fall into this same 'maven namespace' category, e.g. the UIMA case for example?

How to fix the general problem? Thats why I think that the only proper way to fix the bug is to attack it at the heart:
Thats the fact that for maven project A to depend upon project B that is not in maven, maven project A must "publish" some "fake maven release" of project B.

Robert Muir
added a comment - 07/Mar/12 15:43
It's 5 classes it looks like - and it's Apache licensed and there is no release - can't we simply suck this into our code base with a readme and JIRA about removing it and switching to a release when one occurs? Get the whole dependency thing right out of Maven's claws.
Yes while this is the case, and as Emmanuel states, the purpose of his patch, its not a scalable solution in general.
Its also expensive to fork software... there is always the danger that we then become out-of-sync and never sync back up.
(feel free to hit me, for Uwe's example of snowball, which is a perfect example)
What about the other cases that fall into this same 'maven namespace' category, e.g. the UIMA case for example?
How to fix the general problem? Thats why I think that the only proper way to fix the bug is to attack it at the heart:
Thats the fact that for maven project A to depend upon project B that is not in maven, maven project A must "publish" some "fake maven release" of project B.

Robert Muir
added a comment - 07/Mar/12 15:48 And for the record, i want to withdraw my opposition to Emmanuel's patch.
I still feel strongly this doesn't solve the 'real problem', but if it makes the commons-csv developers happy, lets do it.
Though I think its fair to say I will speak up if i see any commits/local modifications to this, because ultimately I want
to see us depending upon a released version of this jar file.

+1 - I think this is the way to go, I just haven't understood any objections to it yet.

I still feel strongly this doesn't solve the 'real problem',

Hate to sound like a broken record jerk, but again I think the real problem is our support of Maven

I don't see how, short of dropping internal Maven support, we solve the real problem. And barring a full glorious solution, this is a small and simple thing we can do that makes the commons guys happy. If all of a sudden we are bombarded by issues like these, I guess we can burn that bridge later. And drop Maven

Mark Miller
added a comment - 07/Mar/12 15:52 That was the purpose of my patch Mark
+1 - I think this is the way to go, I just haven't understood any objections to it yet.
I still feel strongly this doesn't solve the 'real problem',
Hate to sound like a broken record jerk, but again I think the real problem is our support of Maven
I don't see how, short of dropping internal Maven support, we solve the real problem. And barring a full glorious solution, this is a small and simple thing we can do that makes the commons guys happy. If all of a sudden we are bombarded by issues like these, I guess we can burn that bridge later. And drop Maven

I still feel strongly this doesn't solve the 'real problem', but if it makes the commons-csv developers happy, lets do it.

Though I think its fair to say I will speak up if i see any commits/local modifications to this, because ultimately I want
to see us depending upon a released version of this jar file.

To prevent this I would prefer jarjar, its not risky here at all (and also not for other usitility libs from commons). I am already plaing around with that. What I did here was repackage the JAR file and used a simple serach/replace on our source files to refer to new package name. Not even maven config has to be changedm Steven Rowe can publish the solr-commons-csv.jar as he likes.

Uwe Schindler
added a comment - 07/Mar/12 16:02
I still feel strongly this doesn't solve the 'real problem', but if it makes the commons-csv developers happy, lets do it.
Though I think its fair to say I will speak up if i see any commits/local modifications to this, because ultimately I want
to see us depending upon a released version of this jar file.
To prevent this I would prefer jarjar, its not risky here at all (and also not for other usitility libs from commons). I am already plaing around with that. What I did here was repackage the JAR file and used a simple serach/replace on our source files to refer to new package name. Not even maven config has to be changedm Steven Rowe can publish the solr-commons-csv.jar as he likes.

Uwe: maybe we should do it just for this case (rather than offering a general solution?)

Basically what i'm trying to say is that the general problem is complex, I think there are enough examples (jetty, carrot2, uima, etc)
that fall in this category that aren't really solved by jarjar. I think we either need to solve this issue as a 'case-by-case' for commons-csv,
or fix the entire problem (which definitely, definitely, certainly, without a doubt, involves fixing maven).

But to be practical, lets come up with something so that commons-csv developers are happy and can issue a release (which we can then depend upon).

Keeping in mind that this is supposed to be a 3.6 minor release, can we apply the jarjar solution only to commons-csv?
I feel the other situations are more dangerous and I think we should be careful, but at the same time we should address
their concerns and put ourselves in a situation where we don't end out 'forking' it... I don't want to see that.

Robert Muir
added a comment - 07/Mar/12 16:09 Uwe: maybe we should do it just for this case (rather than offering a general solution?)
Basically what i'm trying to say is that the general problem is complex, I think there are enough examples (jetty, carrot2, uima, etc)
that fall in this category that aren't really solved by jarjar. I think we either need to solve this issue as a 'case-by-case' for commons-csv,
or fix the entire problem (which definitely, definitely, certainly, without a doubt, involves fixing maven).
But to be practical, lets come up with something so that commons-csv developers are happy and can issue a release (which we can then depend upon).
Keeping in mind that this is supposed to be a 3.6 minor release, can we apply the jarjar solution only to commons-csv?
I feel the other situations are more dangerous and I think we should be careful, but at the same time we should address
their concerns and put ourselves in a situation where we don't end out 'forking' it... I don't want to see that.

Steve Rowe
added a comment - 07/Mar/12 16:31 I think there are enough examples (jetty, carrot2, uima, etc) [...]
We are not currently publishing third party solr-uima-* artifacts, though we have in the past (versions prior to 3.5.0).

Uwe Schindler
added a comment - 07/Mar/12 16:33 - edited Here the patch and the transformed JAR file (and renamed). Most of the patch is my rename of the JAR file to solr- prefix. Code changes is 5 lines of import statements!
What I did:
copy the original JAR file to a folder
place jarjar-1.2.jar somewhere
edit the rules.txt file and change package names
run java -jar jarjar-1.2.jar process rule.txt commons-csv-1.0-SNAPSHOT-r966014.jar solr-commons-csv-1.0-SNAPSHOT-r966014.jar
place the resulting jar file in solr/core/lib (remove the old one before) and commit it like any other bundled artifact
All tests pass and the supplied JAR file could be published in maven with the name sarowe gave it.

it seems to me, because the jar is "pre-jar-jar'ed" we don't fall into various
IDE configuration conflicts either right? Its just as easy for developers to configure
their workspace (ant eclipse, ant idea)

Robert Muir
added a comment - 07/Mar/12 16:38 Thanks Uwe: i support your change. I think it is the best compromise.
it seems to me, because the jar is "pre-jar-jar'ed" we don't fall into various
IDE configuration conflicts either right? Its just as easy for developers to configure
their workspace (ant eclipse, ant idea)
+1

Uwe Schindler
added a comment - 07/Mar/12 16:48 The keep line is not even needed in the rules.txt. There is one small issue: The resulting JAR file contains an empty org.apache.commons folder and some maven relicts (die,die,die), but thats minor.

Uwe Schindler
added a comment - 07/Mar/12 17:08 Updated patch to make it mor consistent with the naming conventions of Solr.
I think we should do the same for the other file in lib: noggit
Steven Rowe: What other modified JARs do we have that are republished through maven?

Dawid Weiss
added a comment - 07/Mar/12 17:16 Don't jarjar Carrot2, please. We are plugging in our commercial clustering engine into Solr via Carrot2 controllers so if you change package namespace we will no longer be able to do it.

Trunk is the same, except that the carrot2-core jar is no longer required (it's required on branch_3x because it's a specially Java5-compiled jar, but no longer on trunk/4.0, which requires Java 6, and so can use the Maven central artifact.)

Steve Rowe
added a comment - 07/Mar/12 17:23 - edited Steven Rowe: What other modified JARs do we have that are republished through maven?
For branch_3x:
module
jar
Lucene benchmark
lucene/contrib/benchmark/lib/xercesImpl-2.9.1-patched-XERCESJ-1257.jar
Solr clustering
solr/contrib/clustering/carrot2-core-3.5.0.jar
Solr langid
solr/contrib/langid/lib/jsonic-1.2.0.jar
Solr langid
solr/contrib/langid/lib/langdetect-r111-java5.jar
Solr core
solr/lib/apache-solr-noggit-r1099557.jar
Solr core
solr/lib/commons-csv-1.0-SNAPSHOT-r966014.jar
Trunk is the same, except that the carrot2-core jar is no longer required (it's required on branch_3x because it's a specially Java5-compiled jar, but no longer on trunk/4.0, which requires Java 6, and so can use the Maven central artifact.)

For carrot: If you can just compile carrot with Java 5 and produce Java 5 JARs, why does Dawid/Carrot not do this? A Java 6 class/jar file is not really useful if just the version number of the file header is different...

Uwe Schindler
added a comment - 07/Mar/12 17:26 For carrot: If you can just compile carrot with Java 5 and produce Java 5 JARs, why does Dawid/Carrot not do this? A Java 6 class/jar file is not really useful if just the version number of the file header is different...

Well the carrot is similar to the uima case, in both cases we have their committers also as committers working
on integrations within on our project, and they have voiced no problem with how things work so far, so why break it?

Robert Muir
added a comment - 07/Mar/12 19:08 Well the carrot is similar to the uima case, in both cases we have their committers also as committers working
on integrations within on our project, and they have voiced no problem with how things work so far, so why break it?

How to fix the general problem? Thats why I think that the only proper way to fix the bug is to attack it at the heart:
Thats the fact that for maven project A to depend upon project B that is not in maven, maven project A must "publish" some "fake maven release" of project B.

+1, this sums up the Maven issue.

But is it really true? Is there really no way to stop Maven from
making project A's private deps public? This is horribly invasive.

CLASSPATH polution is a separate, widespread, longstanding and very
real issue. I don't think jarjar'ing every dependency Solr (or our
modules, contribs, etc.) is a good general solution.

But I think it's OK in this case... progress not perfection. Your
patch looks good Uwe, thanks!

Michael McCandless
added a comment - 07/Mar/12 19:35
How to fix the general problem? Thats why I think that the only proper way to fix the bug is to attack it at the heart:
Thats the fact that for maven project A to depend upon project B that is not in maven, maven project A must "publish" some "fake maven release" of project B.
+1, this sums up the Maven issue.
But is it really true? Is there really no way to stop Maven from
making project A's private deps public? This is horribly invasive.
CLASSPATH polution is a separate, widespread, longstanding and very
real issue. I don't think jarjar'ing every dependency Solr (or our
modules, contribs, etc.) is a good general solution.
But I think it's OK in this case... progress not perfection. Your
patch looks good Uwe, thanks!

Ryan McKinley
added a comment - 07/Mar/12 19:53 +1 thanks Uwe
I don't think jarjar'ing every dependency Solr
The only jar files in question are snapshot/unreleased code. The commons-csv problem is a non issue if we were using a released version.

Michael McCandless
added a comment - 07/Mar/12 20:13 The commons-csv problem is a non issue if we were using a released version.
Wait, are you talking about the Maven issue ("my private deps become publicly released to Maven"), or the more general JAR hell problem?
For JAR hell, I thought even released versions can cause problems too?
Ie, we include version X of a dep, and the app embedding solr is using version Y, and they may conflict?
Ie the classic problem: http://en.wikipedia.org/wiki/Java_Classloader#JAR_hell

Well the carrot is similar to the uima case, in both cases we have their committers also as committers working on integrations within on our project, and they have voiced no problem with how things work so far, so why break it?

Tommaso Teofili
added a comment - 07/Mar/12 20:27 Well the carrot is similar to the uima case, in both cases we have their committers also as committers working on integrations within on our project, and they have voiced no problem with how things work so far, so why break it?
also starting from 3.5.0 the UIMA dependencies' jars are released artifacts (see SOLR-2746 and thus http://mvnrepository.com/artifact/org.apache.solr/solr-uima/3.5.0 )

For JAR hell, I thought even released versions can cause problems too?

Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

Uwe Schindler
added a comment - 07/Mar/12 20:32 For JAR hell, I thought even released versions can cause problems too?
Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

yes, that's also my understanding of this issue with unreleased dependencies' jars.

Tommaso Teofili
added a comment - 07/Mar/12 20:37 The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.
yes, that's also my understanding of this issue with unreleased dependencies' jars.

Wait, are you talking about the Maven issue ("my private deps become publicly released to Maven"), or the more general JAR hell problem?

I see this entirely as as JAR hell problem. Maven just makes it easier to discover.

For JAR hell, I thought even released versions can cause problems too?

Of course - but this is an expected side-effect of a release. This whole discussion is because the commons-csv people never agreed to support the classnames solr is using. If they request we change the name, it seems only appropriate that we do.

Ryan McKinley
added a comment - 07/Mar/12 20:39 Wait, are you talking about the Maven issue ("my private deps become publicly released to Maven"), or the more general JAR hell problem?
I see this entirely as as JAR hell problem. Maven just makes it easier to discover.
For JAR hell, I thought even released versions can cause problems too?
Of course - but this is an expected side-effect of a release. This whole discussion is because the commons-csv people never agreed to support the classnames solr is using. If they request we change the name, it seems only appropriate that we do.

For carrot: If you can just compile carrot with Java 5 and produce Java 5 JARs, why does Dawid/Carrot not do this? A Java 6 class/jar file is not really useful if just the version number of the file header is different...

It's not that simple. We use 1.6 APIs and we have libraries that are compiled under 1.6. The version embedded in Solr is patched and backported using retroweaver if I remember correctly. Like I said, I don't have a problem if you just publish that particular artefact as is under Solr's groupid. I don't see a problem here, really. Classpath conflicts do happen but it's easier to recover from a classpath conflict than from a jarjared file which happens to use runtime reflection or something like that (just replacing strings is often not enough if class names are constructed dynamically).

Dawid Weiss
added a comment - 07/Mar/12 20:41 For carrot: If you can just compile carrot with Java 5 and produce Java 5 JARs, why does Dawid/Carrot not do this? A Java 6 class/jar file is not really useful if just the version number of the file header is different...
It's not that simple. We use 1.6 APIs and we have libraries that are compiled under 1.6. The version embedded in Solr is patched and backported using retroweaver if I remember correctly. Like I said, I don't have a problem if you just publish that particular artefact as is under Solr's groupid. I don't see a problem here, really. Classpath conflicts do happen but it's easier to recover from a classpath conflict than from a jarjared file which happens to use runtime reflection or something like that (just replacing strings is often not enough if class names are constructed dynamically).

So, I hear that you all wish that you had more control over the dependencies that you propagate when you publish to maven central. Perhaps I could help?

Option 1, which you've already detected, is to use jarjar or shade so that you can publish your fork of someone else's stuff in your own package namespace fairly painlessly, either storing it inside some other jar of yours or publishing it under your coordinates.

This is a way to force the users of your artifacts to explicitly declare some or all of your dependencies, rather than getting them automatically. In other words, virality control.

Now, if you don't publish one of these, anyone who uses your published artifact will get past the maven dependency graph – and then sink or swim based on whether they've made their own arrangements to get the code into their classpath. I can't help wondering how this helps with the original complaint of the Common(s)-Folk: if you release a version of Lucene that requires a forked version of Commons, and you don't change the packages, you're setting up for a classpath conflict, sooner or later, with or without maven, aren't you?

Option 3 is simply to observe that your users can always use 'exclude' to cut off following some dependency link of yours that doesn't appeal to them.

Does any of this help?

p.s. I don't know much about jarjar, but I know that shade has a lot of fancy options that can, in some cases, compensate for reflection or other problems with the package rename process.

Benson Margulies
added a comment - 07/Mar/12 21:27 - edited So, I hear that you all wish that you had more control over the dependencies that you propagate when you publish to maven central. Perhaps I could help?
Option 1, which you've already detected, is to use jarjar or shade so that you can publish your fork of someone else's stuff in your own package namespace fairly painlessly, either storing it inside some other jar of yours or publishing it under your coordinates.
Option 2 is optional dependencies: http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html .
This is a way to force the users of your artifacts to explicitly declare some or all of your dependencies, rather than getting them automatically. In other words, virality control.
Now, if you don't publish one of these, anyone who uses your published artifact will get past the maven dependency graph – and then sink or swim based on whether they've made their own arrangements to get the code into their classpath. I can't help wondering how this helps with the original complaint of the Common(s)-Folk: if you release a version of Lucene that requires a forked version of Commons, and you don't change the packages, you're setting up for a classpath conflict, sooner or later, with or without maven, aren't you?
Option 3 is simply to observe that your users can always use 'exclude' to cut off following some dependency link of yours that doesn't appeal to them.
Does any of this help?
p.s. I don't know much about jarjar, but I know that shade has a lot of fancy options that can, in some cases, compensate for reflection or other problems with the package rename process.

Uwe Schindler
added a comment - 07/Mar/12 21:37 p.s. I don't know much about jarjar, but I know that shade has a lot of fancy options that can, in some cases, compensate for reflection or other problems with the package rename process.
jarjar does the same. Shade is only available to Maven so not really useable for us.

I prepared my checkout for committing this. It seems that we all agree that this is, until commons-csv is released, the simpliest option. I cleaned up the JAR file a little bit and removed the empty commons directory and removed the wrong maven metadata. I also added a note to NOTICE.txt, that this JAR file is a prerelease of commons-csv and was jarjared to an internal solr package name.

Uwe Schindler
added a comment - 07/Mar/12 21:40 I prepared my checkout for committing this. It seems that we all agree that this is, until commons-csv is released, the simpliest option. I cleaned up the JAR file a little bit and removed the empty commons directory and removed the wrong maven metadata. I also added a note to NOTICE.txt, that this JAR file is a prerelease of commons-csv and was jarjared to an internal solr package name.
Any objections?

This is a way to force the users of your artifacts to explicitly declare some or all of your dependencies, rather than getting them automatically. In other words, virality control.

I don't like the fact that taking this approach for all of our non-Mavenized dependencies (by which I mean those that are not in Maven Central) would put a much greater burden on our users that consume Lucene/Solr artifacts via Maven.

Users would have to either have to a) download the binary release and manually install the non-Maven artifacts one-by-one in their local or internal repository (after consulting both the top-level Maven POM and a list of per-module dependencies that currently only exists in a lib/ directory listing); or b) download the source release, run ant get-maven-poms, then run mvn -N -Pbootstrap install. Neither of these fall within the expected level of effort for Maven users.

-1 from me for using optional dependencies to counter Maven's virality.

Steve Rowe
added a comment - 07/Mar/12 21:53
Option 2 is optional dependencies: http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html .
This is a way to force the users of your artifacts to explicitly declare some or all of your dependencies, rather than getting them automatically. In other words, virality control.
I don't like the fact that taking this approach for all of our non-Mavenized dependencies (by which I mean those that are not in Maven Central) would put a much greater burden on our users that consume Lucene/Solr artifacts via Maven.
Users would have to either have to a) download the binary release and manually install the non-Maven artifacts one-by-one in their local or internal repository (after consulting both the top-level Maven POM and a list of per-module dependencies that currently only exists in a lib/ directory listing); or b) download the source release, run ant get-maven-poms , then run mvn -N -Pbootstrap install . Neither of these fall within the expected level of effort for Maven users.
-1 from me for using optional dependencies to counter Maven's virality.

Uwe Schindler
added a comment - 07/Mar/12 22:06 I think I leave it as it is for now. The jar file name contains the revision no of the Commons-CSV, so it should be unmodified (exept the package name).
Once Commons-CSV is released, we will drop the JAR file and replace by the official dependency.

For JAR hell, I thought even released versions can cause problems too?

Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

I see. So then we "subverted" Maven's ability to catch the
conflicting versions of the same dep, by not using the same
coordinates that commons-csv will eventually use, once they release?

Given that, as crazy as this sounds, maybe we should have published
using the same coordinates? Ie this way Maven apps would know they
had conflicting versions (one, from Solr, of what is basically a
pre-release snapshot of commons-csv, and another of a real, eventual
future commons-csv release).

Or.... maybe Maven should not make it a hard requirement that a
non-snapshot release (of Solr) cannot depend on a snapshot release (of
commons-csv)? After all, this is really a judgement call of the
committers (in Solr), as to whether a snapshot release is "good
enough"/well tested, given how the project is using it, for its
purposes/release?

Michael McCandless
added a comment - 07/Mar/12 22:09
For JAR hell, I thought even released versions can cause problems too?
Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.
I see. So then we "subverted" Maven's ability to catch the
conflicting versions of the same dep, by not using the same
coordinates that commons-csv will eventually use, once they release?
Given that, as crazy as this sounds, maybe we should have published
using the same coordinates? Ie this way Maven apps would know they
had conflicting versions (one, from Solr, of what is basically a
pre-release snapshot of commons-csv, and another of a real, eventual
future commons-csv release).
Or.... maybe Maven should not make it a hard requirement that a
non-snapshot release (of Solr) cannot depend on a snapshot release (of
commons-csv)? After all, this is really a judgement call of the
committers (in Solr), as to whether a snapshot release is "good
enough"/well tested, given how the project is using it, for its
purposes/release?
Net/net it looks like Maven's invasive/viral restrictions ("all of
your deps become publicly released", and "your deps cannot be
snapshots") are tying our hands here?
Surely this problem has come up before in Maven usage?

Benson Margulies
added a comment - 07/Mar/12 22:14 Michael,
I think you have put your finger on it. If you don't change the packages, don't change the gav. If you do change the packages, do change the gav.
None of which is to claim that Maven isn't full of flaws in this whole area; it does not model the entire world of things people need to do.

Steve Rowe
added a comment - 07/Mar/12 23:07 Steven Rowe: If I broke something with maven, can you please fix? I have no idea...
Only one problem: the name of the source file in the bootstrap profile (copies lib/ files to users' local Maven repositories) - I just committed the file renaming to both trunk and branch_3x.

Otherwise these are just classes inside the war file (essentially an implementation detail of solr).

The java classpath is not an implementation detail if you expect any of your jars to share a classloader with others. At least not until project Jigsaw is complete and Solr is using Java 8 modules.

Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

Exactly. If you publish a maven artifact under a custom groupId:artifactId the sane thing to do is make sure the classes don't overlap with the old one. Maven guards against duplicate artifacts in the same classpath. Otherwise users will implicitly get both copies on their classpath too easily.

The arguments that this is more than a maven issue are bogus.
Do you take me for a fool?

What happens if a non-maven user gets both solr-commons-csv.jar and commons-csv.jar on their classpath?

I think you missed my point?: Changing the Ant build to fix a Maven problem is a non-starter.

Then specify it as <optional>true</optional> or <scope>provided</scope>. If you publish to maven central a pom.xml that says "Solr must have jar FOO!" and then complain that it is a "Maven problem" when jar FOO is actually used, then you've used maven incorrectly.

Users would have to either have to a) download the binary release and manually install the non-Maven artifacts one-by-one in their local or internal repository (after consulting both the top-level Maven POM and a list of per-module dependencies that currently only exists in a lib/ directory listing); or b) download the source release, run ant get-maven-poms, then run mvn -N -Pbootstrap install. Neither of these fall within the expected level of effort for Maven users.

or C)
you can publish the artifacts, link to them as optional, and users can specify in their pom's which jar to bring in. This is not that uncommon – it is the expected way to deal with slf4j for example, where one and only one of several options must be chosen at runtime. Same thing with the bytecode re-writing dependency in Hibernate.

-1 from me for using optional dependencies to counter Maven's virality.

Please consider this further with option C) above for artifacts that live in a custom groupId:artifactId namespace but not a custom package.

Scott Carey
added a comment - 07/Mar/12 23:12
This is a maven issue. solr-commons-csv only exists to maven.
Otherwise these are just classes inside the war file (essentially an implementation detail of solr).
The java classpath is not an implementation detail if you expect any of your jars to share a classloader with others. At least not until project Jigsaw is complete and Solr is using Java 8 modules.
Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.
Exactly. If you publish a maven artifact under a custom groupId:artifactId the sane thing to do is make sure the classes don't overlap with the old one. Maven guards against duplicate artifacts in the same classpath. Otherwise users will implicitly get both copies on their classpath too easily.
The arguments that this is more than a maven issue are bogus.
Do you take me for a fool?
What happens if a non-maven user gets both solr-commons-csv.jar and commons-csv.jar on their classpath?
I think you missed my point?: Changing the Ant build to fix a Maven problem is a non-starter.
Then specify it as <optional>true</optional> or <scope>provided</scope>. If you publish to maven central a pom.xml that says "Solr must have jar FOO!" and then complain that it is a "Maven problem" when jar FOO is actually used, then you've used maven incorrectly.
Users would have to either have to a) download the binary release and manually install the non-Maven artifacts one-by-one in their local or internal repository (after consulting both the top-level Maven POM and a list of per-module dependencies that currently only exists in a lib/ directory listing); or b) download the source release, run ant get-maven-poms, then run mvn -N -Pbootstrap install. Neither of these fall within the expected level of effort for Maven users.
or C)
you can publish the artifacts, link to them as optional, and users can specify in their pom's which jar to bring in. This is not that uncommon – it is the expected way to deal with slf4j for example, where one and only one of several options must be chosen at runtime. Same thing with the bytecode re-writing dependency in Hibernate.
-1 from me for using optional dependencies to counter Maven's virality.
Please consider this further with option C) above for artifacts that live in a custom groupId:artifactId namespace but not a custom package.

or C)
you can publish the artifacts, link to them as optional, and users can specify in their pom's which jar to bring in. This is not that uncommon – it is the expected way to deal with slf4j for example, where one and only one of several options must be chosen at runtime. Same thing with the bytecode re-writing dependency in Hibernate.

-1 from me for using optional dependencies to counter Maven's virality.

Please consider this further with option C) above for artifacts that live in a custom groupId:artifactId namespace but not a custom package.

My implicit assumption was that as a result of using optional dependencies, we would cease publishing any third-party dependencies to Maven Central. (That's part of what I was -1'ing.)

Steve Rowe
added a comment - 08/Mar/12 01:14
or C)
you can publish the artifacts, link to them as optional, and users can specify in their pom's which jar to bring in. This is not that uncommon – it is the expected way to deal with slf4j for example, where one and only one of several options must be chosen at runtime. Same thing with the bytecode re-writing dependency in Hibernate.
-1 from me for using optional dependencies to counter Maven's virality.
Please consider this further with option C) above for artifacts that live in a custom groupId:artifactId namespace but not a custom package.
My implicit assumption was that as a result of using optional dependencies, we would cease publishing any third-party dependencies to Maven Central. (That's part of what I was -1'ing.)

If you don't change the packages, don't change the gav. If you do change the packages, do change the gav.

+1

And... I hate to heap even more invasiveness/virality onto Maven, but... it
seems like if Maven also checked across artifacts
(groupID+artifactID?) for package name conflicts... that would prevent
future cases of accidentally releasing a private dependency? (Hmmm,
assuming commons-csv releases snapshots...).

Michael McCandless
added a comment - 08/Mar/12 11:32 If you don't change the packages, don't change the gav. If you do change the packages, do change the gav.
+1
And... I hate to heap even more invasiveness/virality onto Maven, but... it
seems like if Maven also checked across artifacts
(groupID+artifactID?) for package name conflicts... that would prevent
future cases of accidentally releasing a private dependency? (Hmmm,
assuming commons-csv releases snapshots...).

Sami Siren
added a comment - 09/Mar/12 09:04 I discovered this issue because someone who imported solr-commons-csv in its project reported an issue ( SANDBOX-330 ), an infinite loop with trailing comments, that was already fixed on the trunk
So is the current solr version affected by this? Should it be updated?