User applications built against Hadoop might add all Hadoop jars (including Hadoop's library dependencies) to the application's classpath. Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' classpaths.

Policy

Currently, there is NO policy on when Hadoop's dependencies can change.

Furthermore, we have *already* changed our classpath in hadoop-2.x. Again, as I pointed out in the previous thread, here is the precedent:

On Jun 21, 2014, at 5:59 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:thanks,ArunCONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

That classpath policy was explicitly added because we can't lock down ourdependencies for security/bug fix reasons, and also because if we do updatesomething explicitly, their transitive dependencies can change -beyond ourcontrol.

https://issues.apache.org/jira/browse/HADOOP-9555 is an example of this: anupdate of ZK explicitly to fix an HA problem. Are there changes in itsdependencies? I don't know. But we didn't have a choice to update if wewanted NN & RM failover to work reliably, so we have to take any otherchanges that went in.

JDK upgrades can be viewed as an extension of this -we are changing thebase platform that Hadoop runs on. More precisely, for the Java 6- >Java 7update, we are reflecting the fact that nobody is running in production onJava 6

What we did there was issue a warning in 0.18 that it would be the lastJava 5 version; 0.19 moved up -we can do the same for a Hadoop 2.x releaseat some point this year.

On 24 June 2014 11:43, Arun C Murthy <[EMAIL PROTECTED]> wrote:CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Here's what I am behind - a modified proposal C. - Overall I wouldn't think about EOL of JDK7 and/or JDK8 specifically given how long it has taken for JDK6 life-cycle to end. We should try to focus on JDK7 only for now. - As we have seen, a lot (majority?) of orgs on Hadoop have moved beyond JDK6 and are already running on JDK7. So upgrading to JDK7 is more of a reflection of reality (to quote Steve) than it in itself being a disruptive change. - We should try decoupling the discussion of major releases from JDK upgrades. We have seen individual libraries getting updated right in the 2.x lines as and when necessary. Given the new reality of JDK7, I don't see the 'JDK change' as much different from the library upgrades.

We have seen how long it has taken (and still taking) users and organization to move from Hadoop 1 to Hadoop 2. A Hadoop 3/4 that adds nothing else other than JDK upgrades will be a big source of confusion for users. A major version update is also seen an opportunity for devs to break APIs. Unless we have groundbreaking 'features' (like YARN or wire-compatibility in Hadoop-2) that a majority of users want and that specifically warrant incompatible changes in our APIs or wire protocols, we are better off separating the major-version update discussion into ints own.

Irrespective of all this, we should actively get behind better isolation of user classes/jars from MapReduce classpath. This one's been such a long running concern, it's not funny anymore.

Thanks,+Vinod

On Jun 24, 2014, at 11:17 AM, Andrew Wang <[EMAIL PROTECTED]> wrote:

CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

While we haven't codified this in our compatibility guidelines, dropping aJava version seems to me like change that needs to happen alongside a majorrelease. In plain talk, it has the ability to break everything for userswho aren't doing anything particularly unreasonable.

I don't think we should accept Hadoop's compatibility behavior 6 years agoas precedent for what we can do now. That was before Hadoop 1.0. And weprobably have several orders of magnitude more production users.

I also don't think we should accept library upgrades as precedent. Whilethis may make sense in specific situations, I definitely don't think thisis OK in general. I'd be very nervous about updating Guava outside ofmajor version upgrade.

Lastly, I think the claim that nobody is running in production on Java 6 isunsubstantiated.

We need to think about a JDK upgrade in terms of what its implications arefor users, not in terms of what other kinds of compatibility we've brokenthat's loosely analogous.

On dependencies, we've bumped library versions when we think it's safe andthe APIs in the new version are compatible. Or, it's not leaked to the appclasspath (e.g the JUnit version bump). I think the JIRAs Arun mentionedfall into one of those categories. Steve can do a better job explainingthis to me, but we haven't bumped things like Jetty or Guava because theyare on the classpath and are not compatible. There is this line in thecompat guidelines:

- Existing MapReduce, YARN & HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported.

Since Hadoop apps can and do depend on the Hadoop classpath, the classpathis effectively part of our API. I'm sure there are user apps out there thatwill break if we make incompatible changes to the classpath. I haven't readup on the MR JIRA Arun mentioned, but there MR isn't the only YARN app outthere.

Sticking to the theme of "work unmodified", let's think about the usereffort required to upgrade their JDK. This can be a very expensive task. Itmight need approval up and down the org, meaning lots of certification,testing, and signoff. Considering the amount of user effort involved here,it really seems like dropping a JDK is something that should only happen ina major release. Else, there's the potential for nasty surprises in asupposedly "minor" release.

That said, we are in an unhappy place right now regarding JDK6, and it'strue that almost everyone's moved off of JDK6 at this point. So, I'd beokay with an intermediate 2.x release that drops JDK6 support (but noincompatible changes to the classpath like Guava). This is basically free,and we could start using JDK7 idioms like multi-catch and new NIO stuff inHadoop code (a minor draw I guess).

My higher-level goal though is to avoid going through this same pain againwhen JDK7 goes EOL. I'd like to do a JDK8-based release before then forthis reason. This is why I suggested skipping an intermediate 2.x+JDK7release and leapfrogging to 3.0+JDK8. 10 months is really not that far inthe future, and it seems like a better place to focus our efforts. I wasalso hoping it'd be realistic to fix our classpath leakage by then, sincethen we'd have a nice, tight, future-proofed new major release.

After reading this thread and thinking a bit about it, I think it should beOK such move up to JDK7 in Hadoop 2 for the following reasons:

* Existing Hadoop 2 releases and related projects are running on JDK7 in production.* Commercial vendors of Hadoop have already done lot of work to ensure Hadoop on JDK7 works while keeping Hadoop on JDK6 working.* Different from many of the 3rd party libraries used by Hadoop, JDK is much stricter on backwards compatibility.

IMPORTANT: I take this as an exception and not as a carte blanche for 3rdparty dependencies and for moving from JDK7 to JDK8 (though it could OK forthe later if we end up in the same state of affairs)

Even for Hadoop 2.5, I think we could do the move:

* Create the Hadoop 2.5 release branch.* Have one nightly Jenkins job that builds Hadoop 2.5 branch with JDK6 to ensure not JDK7 language/API feature creeps out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases.* Sanity tests for the Hadoop 2.5.x releases should be done with JDK7.* Apply Steve’s patch to require JDK7 on trunk and branch-2.* Move all Apache Jenkins jobs to build/test using JDK7.* Starting from Hadoop 2.6 we support JDK7 language/API features.

Effectively what we are ensuring that Hadoop 2.5.x builds and test withJDK6 & JDK7 and that all tests towards the releaseare done with JDK7.

Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, orif upgrade to Hadoop 2.5.x and they run into any issue because of JDK6(which it would be quite unlikely) they can reactively upgrade to JDK7.

+1, though I think 2.5 may be premature if we want to send a warning note"last ever". That's an issue for followon "when in branch 2".

Guava and protobuf.jar are two things we have to leave alone, with thefirst being unfortunate, but their attitude to updates is pretty dramatic.The latter? We all know how traumatic that can be.

-SteveOn 24 June 2014 16:44, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Alejandro,On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote:+1 - I think we are all on the same page here. Fully agree.

+1. Agree again - let's just wait/watch.

From the thread I've become more convinced that (as you've noted before)that since we are at the bottom of the stack, we need to be moreconservative.

From http://www.oracle.com/technetwork/java/eol-135779.html, it looks likeApril 2015 is the *earliest* Java7 will EOL. Java6 EOL was Feb 2011 and weare still debating whether we can stop supporting it. So, my guess is thatwe will support Java7 at least for a year after it's EOL i.e. till sometimein early 2016. It's just practical.

Net - We really don't have a good idea when a significant portion of userswill actually migrate to Java 8. W.r.t Java7 this took nearly 3 years afterJava6 EOL. So for now, let's just wait & see how things develop in thefield.I think the mechanics make perfect sense to me. I think we should probablythink a bit more on whether we drop support for JDK6 in hadoop-2.6 orhadoop-2.7.

I'd like to add one more:* Sometime soon (within a release or two) after we actually drop supportfor Java6 and move branch-2 to JDK7, let's also start testing on Java8.

This way we will be ready for Java8 early regardless of when we stopsupport for Java7. Dropping Java7 is a bridge we can cross when we come toit.thanks,ArunEffectively what we are ensuring that Hadoop 2.5.x builds and test withArun C. MurthyHortonworks Inc.http://hortonworks.com/

CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Furthermore, it's probably not well known that in hadoop-2 the user application (MR or otherwise) can also pick the JDK version by using JAVA_HOME env for the container. So, in effect, MR applications can continue to use java6 while YARN is running java7 - this hasn't been tested extensively though. This capability did not exist in hadoop-1. We've also made some progress with https://issues.apache.org/jira/browse/MAPREDUCE-1700 to defuse user jar-deps from MR system jars. https://issues.apache.org/jira/browse/MAPREDUCE-4421 also helps by ensuring MR applications can pick exact version of MR jars they were compiled against; and not rely on cluster installs.

Hope that helps somewhat.

thanks,ArunCONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

I agree with Alejandro. Changing minimum JDKs is not an incompatible changeand is fine in the 2 branch. (Although I think it is would *not* beappropriate for a patch release.) Of course we need to do it withforethought and testing, but moving off of JDK 6, which is EOL'ed is a goodthing. Moving to Java 8 as a minimum seems much too aggressive and I wouldpush back on that.

I'm also think that we need to let the dust settle on the Hadoop 2 line fora while before we talk about Hadoop 3. It seems that it has only been inthe last 6 months that Hadoop 2 adoption has reached the main stream users.Our user community needs time to digest the changes in Hadoop 2.x before wefracture the community by starting to discuss Hadoop 3 releases.

I'm also +1 for getting us to JDK7 within the 2.x line after reading theproposals and catching up on the discussion in this thread.

Has anyone yet considered how to coordinate this change with downstreamprojects? Would we request downstream projects to upgrade to JDK7 firstbefore we make the move? Would we switch to JDK7, but run javac -target1.6 to maintain compatibility for downstream projects during an interimperiod?

On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you arestill using jdk7 libraries and you could use new APIs, thus breaking jdk6both at compile and runtime.

you need to compile with jdk6 to ensure you are not running into thatscenario. that is why i was suggesting the nightly jdk6 build/test jenkinsjob.On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth <[EMAIL PROTECTED]>wrote:Alejandro

>>> My higher-level goal though is to avoid going through this same pain >>> again when JDK7 goes EOL. I'd like to do a JDK8-based release >>> before then for this reason. This is why I suggested skipping an >>> intermediate 2.x+JDK7 release and leapfrogging to 3.0+JDK8.

I'm thinking skipping an intermediate release and leapfrogging to 3.0 makes it difficult to maintain branch-2. It's only about a half year from 2.2 GA, so we should maintain branch-2 and create bug-fix releases for long-term even if 3.0+JDK8 is released.

I understood the plan for avoiding JDK7-specific features in our code, andyour suggestion to add an extra Jenkins job is a great way to guard againstthat. The thing I haven't seen discussed yet is how downstream projectswill continue to consume our built artifacts. If a downstream projectupgrades to pick up a bug fix, and the jar switches to 1.7 class files, buttheir project is still building with 1.6, then it would be a nasty surprise.

These are the options I see:

1. Make sure all other projects upgrade first. This doesn't soundfeasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up ourmigration indefinitely.

2. We switch to JDK7, but run javac with -target 1.6 until the wholeecosystem upgrades. I find this undesirable, because in a certain sense,it still leaves a bit of 1.6 lingering in the project. (I'll assume thatend-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.)

3. Just declare a clean break on some version (your earlier email said 2.5)and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, thissets us up for longer-term maintenance and patch releases off of the 2.4branch if a downstream project that's still on 1.6 needs to pick up acritical bug fix.

Of course, this is all a moot point if all the downstream ecosystemprojects have already made the switch to JDK7. I don't know the status ofthat off the top of my head. Maybe someone else out there knows? If not,then I expect I can free up enough in a few weeks to volunteer for trackingdown that information.

On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote:CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Arun, thanks for the clarification regarding MR classpaths. It sounds likethe story there is improved and still improving.

However, I think we still suffer from this at least on the HDFS side. Wehave a single JAR for all of HDFS, and our clients need to have all the fundeps like Guava on the classpath. I'm told Spark sticks a newer Guava atthe front of the classpath and the HDFS client still works okay, but thisis more happy coincidence than anything else. While we're leaking deps,we're in a scary situation.

API compat to me means that an app should be able to run on a new minorversion of Hadoop and not have anything break. MAPREDUCE-4421 sounds likeit allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but whatshould also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs andhave nothing break. If we muck with the classpath, my understanding is thatthis could break.

Owen, bumping the minimum JDK version in a minor release like this shouldbe a one-time exception as Tucu stated. A number of people have pointed outhow painful a forced JDK upgrade is for end users, and it's not somethingwe should be springing on them in a minor release unless we're *very*confident like in this case.

Chris, thanks for bringing up the ecosystem. For CDH5, we standardized onJDK7 across the CDH stack, so I think that's an indication that mostecosystem projects are ready to make the jump. Is that sufficient in yourmind?

For the record, I'm also +1 on the Tucu plan. Is it too late to do this for2.5? I'll offer to help out with some of the mechanics.

Thanks everyone for the discussion. Looks like we have come to a pragmatic and progressive conclusion.

In terms of execution of the consensus plan, I think a little bit of caution is in order.

Let's give downstream projects more of a runway.

I propose we inform HBase, Pig, Hive etc. that we are considering making 2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they are comfortable we can pull the trigger in 2.7.

thanks,Arun

CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

If we want, 2.7 could be a parallel release or one soon after 2.6. We couldupgrade other dependencies that require JDK7 as well.On Fri, Jun 27, 2014 at 3:01 PM, Arun C. Murthy <[EMAIL PROTECTED]> wrote:

Following up on ecosystem, I just took a look at the Apache trunk pom.xmlfiles for HBase, Flume and Oozie. All are specifying 1.6 for source andtarget in the maven-compiler-plugin configuration, so there may beadditional follow-up required here. (For example, if HBase has made astatement that its client will continue to support JDK6, then it wouldn'tbe practical for them to link to a JDK7 version of hadoop-common.)

On Fri, Jun 27, 2014 at 3:10 PM, Karthik Kambatla <[EMAIL PROTECTED]>wrote:CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Guava is a separate problem and I think we should have a separatediscussion "what can we do about guava"? That's more traumatic than a JDKupdate, I fear, as the guava releases care a lot less about compatibility.I don't worry about JDK updates removing classes like "StringBuffer"because "StringBuilder" is better.On 27 June 2014 19:26, Andrew Wang <[EMAIL PROTECTED]> wrote:very good point.I think this is possible by having the app upload all the JARs...I need toexperiment here myself.

+1, we've had no complaints about things not working on Java 7. It's beenout a long time. IF you look at our own code, the main thing that brokewere tests -due to junit test case ordering- and not much else.CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext