eric baldeschwieler
added a comment - 11/Jul/12 04:17 What is the driving use case?
I'd suggest that anyone who wants the data encrypted on the wire, will want it encrypted at rest on both sides as well. The data is as vulnerable there.
I wonder if we can come up with an approach that just introduces new plugins and doesn't add any hadoop code? The right thing is probably to use the compression codecs to encrypt on the way to disk.
thoughts?

Alejandro Abdelnur
added a comment - 11/Jul/12 17:04 @eric14,
The driving use case is to avoid data spoofing while on the wire.
Agree, encrypting data at both sides is the obvious follow up to this JIRA in order to have end to end over the wire confidentiality.
In current Hadoop, as you suggest, you can use compression codecs to do encryption on both sides.
However, you can not do that for the shuffle. Thus this JIRA to tackle the shuffle case first.
Of course, this functionality would be disabled by default, even if Kerberos security is enabled. You'll need to set another knob to enable shuffle encryption.
Hope this clarifies.

Anyone I've talked to who has been concerned about over the wire has also raise the on disk issue. So, would it be better to put the encryption where we write to disk, where we already compress? It seems like this might be less invasive and would be more complete.

It has downsides if you do lots of spills, but it is much more complete. The compaction issue can be addressed by collation work folks are already playing with down the road.

—

Do you already have an HDFS solution in place? This only covers a fraction of the data traffic.

eric baldeschwieler
added a comment - 11/Jul/12 17:28 Anyone I've talked to who has been concerned about over the wire has also raise the on disk issue. So, would it be better to put the encryption where we write to disk, where we already compress? It seems like this might be less invasive and would be more complete.
It has downsides if you do lots of spills, but it is much more complete. The compaction issue can be addressed by collation work folks are already playing with down the road.
—
Do you already have an HDFS solution in place? This only covers a fraction of the data traffic.

Alejandro Abdelnur
added a comment - 11/Jul/12 18:07 @eric14, my bad you can use a codec in the shuffle, when looking into this I've discarded that option, let me remember exactly why and I'll follow up.

When looking at encryption on the wire for the shuffle the alternatives that popped up where transport encryption (HTTPS) and data/spills encryption (doable via a codec).

Using HTTPS requires improving the Fetcher/ShuffleHandler (Netty/JDK-URL) to use HTTPS and configuring certificates. It is a well understood/standard/proven technology and gives you end to end confidentiality, integrity, server authentication (and optionally client authentication), in an out of box manner without room to get things wrong. The server certificates private keys are out of reach from job tasks (they are used by the NM, similar to Kerberos keytabs).

Using a codec, requires (leveraging a existing plugin point) a compression codec implementation that adds cipher-streams wrappers to the original streams and in addition could delegate to a real compression codec (in order not to lose compression if doing encryption). This requires us choosing a Cipher implementation by hand (which I'm not an expert on) and I'm not sure which one would be the best choice and what are the weaknesses of each one of them (http://en.wikipedia.org/wiki/Stream_cipher#Comparison_Of_Stream_Ciphers). Using a cipher on its own will provide confidentiality but it would not provide integrity or man-in-the-middle protection (unless we end up implementing something like TLS). In addition, both ends are controlled by job tasks, thus it becomes the responsibility of the user to create/distribute/protect the secrets that are basis of confidentiality. In addition, with the codec approach the HTTP shuffle requests/response headers go in the clear which could enable a man-in-the-middle attach.

Alejandro Abdelnur
added a comment - 13/Jul/12 06:11 When looking at encryption on the wire for the shuffle the alternatives that popped up where transport encryption (HTTPS) and data/spills encryption (doable via a codec).
Using HTTPS requires improving the Fetcher/ShuffleHandler (Netty/JDK-URL) to use HTTPS and configuring certificates. It is a well understood/standard/proven technology and gives you end to end confidentiality, integrity, server authentication (and optionally client authentication), in an out of box manner without room to get things wrong. The server certificates private keys are out of reach from job tasks (they are used by the NM, similar to Kerberos keytabs).
Using a codec, requires (leveraging a existing plugin point) a compression codec implementation that adds cipher-streams wrappers to the original streams and in addition could delegate to a real compression codec (in order not to lose compression if doing encryption). This requires us choosing a Cipher implementation by hand (which I'm not an expert on) and I'm not sure which one would be the best choice and what are the weaknesses of each one of them ( http://en.wikipedia.org/wiki/Stream_cipher#Comparison_Of_Stream_Ciphers ). Using a cipher on its own will provide confidentiality but it would not provide integrity or man-in-the-middle protection (unless we end up implementing something like TLS). In addition, both ends are controlled by job tasks, thus it becomes the responsibility of the user to create/distribute/protect the secrets that are basis of confidentiality. In addition, with the codec approach the HTTP shuffle requests/response headers go in the clear which could enable a man-in-the-middle attach.

Alejandro Abdelnur
added a comment - 16/Jul/12 20:23 patch with complete implementation. Introducing an SSLFactory class in common so it can be used by follow up HADOOP-8581 . Patch does not have documentation yet. I'll work on that next.

Alejandro Abdelnur
added a comment - 17/Jul/12 05:01 and not taking care of most of the javac warnings. There are a bunch of them because of direct use of SunX509 classes, but there is not alternative for this.

Alejandro Abdelnur
added a comment - 17/Jul/12 17:45 And now with testcases passing (the encrypted shuffle testcase was leaving a core-site.xml that was being picked up by other testcases)
Forgot to mention before, all this work is based on an initial implementation by Tom White.

Alejandro Abdelnur
added a comment - 17/Jul/12 21:50 Same as last patch, just adding a comment about the performance impact (per Devaraj's suggestion):
Using encrypted shuffle will incurs in a significant performance impact. Users should profile this and potentially reserve 1 or more cores for encrypted shuffle.

Tom White
added a comment - 17/Jul/12 22:43 This looks good to me (although, as Alejandro mentioned, I have worked on an earlier version of this, so someone else should review it too). A few minor things I noticed:
SSLFactory is in a mapreduce package, but in the common project. Just move it to org.apache.hadoop.security.ssl?
Mark SSLFactory.resolvePropertyName with the VisibleForTesting annotation.
ReloadingX509TrustManager allows 'this' to escape in its constructor. Perhaps give it a separate initialization method to start the reloader.

Thanks for the review Tom. I'll integrate your changes. After a chat with Devaraj and other with you, I'll do some refactoring in how the keystores are produced to enable plugin alterante implementations (a follow up JIRA will be for creating the certs in teh keystore using the jobtoken secrets).

Alejandro Abdelnur
added a comment - 17/Jul/12 22:55 Thanks for the review Tom. I'll integrate your changes. After a chat with Devaraj and other with you, I'll do some refactoring in how the keystores are produced to enable plugin alterante implementations (a follow up JIRA will be for creating the certs in teh keystore using the jobtoken secrets).

Alejandro Abdelnur
added a comment - 18/Jul/12 00:04 addressing Tom's comments and making the keystores pluggable (to later enable other mechanisms such as jobtoken to generate certificates on the fly).

Why are we passing the factory mode through the configuration, instead of just making it a parameter for init()? It seems a little fragile/unnecessary, and a bit confusing since it's not a parameter that the user sets.

What's jks? You also use the term "jks" in the conf files, but I don't know what it refers to (again, as an SSL n00b). Improvements:
– in the config file where you say "default value is 'jks'", add "which enables the blah blah type key store" and some reference to what it means?
– in the code, add a constant SSL_KEYSTORE_TYPE_DEFAULT, and javadoc with a pointer to what jks is.

If this property isn't set, you'll end up passing an empty string to the FileInputStream constructor, which will end up giving a hard-to-diagnose message. Check whether keystoreLocation.isEmpty(), and if it is, throw an appropriate exception including the config name.

Same goes for trustStoreLocation

Style nit: you have several javadocs for getters which are redundant, eg:

Why should you use an empty one? If the user configures a path to a trust store, and then starts up but the store can't be found, I don't think we should ignore their config. Better to bail out on startup. Then all of the null checks later on in this file could be removed.

If it fails to reload, why not stick to the previous version of the reference instead of falling back to empty?

+ * This SSLFactory uses a {@link ReloadingX509TrustManager} intance,
+ * which reloads public keys if the truststore file changes.
+ * <p/>
+ * This factory is used to configure HTTPS in Hadoop HTTP based endpoints, both
+ * client & server.

Typo: 'intance'
Style: don't abbreviate the word "and" as '&' – it's invalid javadoc and also just harder to read.

+ publicenum Mode { CLIENT, SERVER }

This can be static right? Also, since it's an inner class, you need an @InterfaceAudience.Private on it, too, or else it shows up in the public javadoc. (unfortunately the annotation doesn't get inherited from its outer class)

Need try..finally. A few other places later in this same file that need this fix.

+ // Wait so that the file modification time is different
+ Thread.sleep((tm.getReloadInterval() + 2) * 1000);

You have this in a bunch of places in the test - but if you set the last modified time of the file, as you do elsewhere in the test, then you shouldn't have to sleep, except for waiting for it to notice the reload. If you change the reload interval to be specified in millis instead of seconds, then you could set it to 10ms or so for the tests and these tests would run a lot faster.

In TestSSLFactory, you use Assert.fail() in a lot of places after catching an Exception. Instead, just let the exception fall through which will fail the test, with the advantage that we'll actually have the stack trace of the exception instead of an unexplained failure message. In the cases where you expect an exception, use GenericTestUtils.assertExceptionContains to check the text.

What's 8192 here? Need a constant or config. If it's a buffer size, I'd think 64K or 128K would probably perform better, based on my general experience with java IO.

In the docs, under the ssh-client configuration, it references ssl-server.xml in one spot.

Typo: "trutsstore" in one place.

Typo: "will incurs in a significant"

General comment: what's the point of client certificates here? They're not a secret, since all users share them. I would think they'd need to be shipped with the job in the distributed-cache, if the use case is for cross-cluster authentication in tools like distcp, since different users may want to distcp from different clusters, and also have different access controls.

Todd Lipcon
added a comment - 19/Jul/12 01:41
the reformatting of ssl-client.xml.example and ssl-server.xml.example makes it a little hard to read the diff. Is it necessary to reindent, etc?
Style:
+ * if the trust certificates keystore file changes, the trustmanager
+ * is refreshed with the new trust certificate entries (using a
+ * {@link ReloadingX509TrustManager} trustmanager).
Formatting can be improved here - eg TrustManager is a java class, so should probably
{@link}
it to make it clear you're talking about a specific class and not an abstract concept. (as someone who doesn't know the SSL APIs well, it would make it easier to read)
+ SSLFactory.Mode mode =
+ SSLFactory.Mode.valueOf(conf.get(SSLFactory.SSL_FACTORY_MODE));
Why are we passing the factory mode through the configuration, instead of just making it a parameter for init()? It seems a little fragile/unnecessary, and a bit confusing since it's not a parameter that the user sets.
+ String keystoreType =
+ conf.get(resolvePropertyName(mode, SSL_KEYSTORE_TYPE_TPL), "jks" );
What's jks? You also use the term "jks" in the conf files, but I don't know what it refers to (again, as an SSL n00b). Improvements:
– in the config file where you say "default value is 'jks'", add "which enables the blah blah type key store" and some reference to what it means?
– in the code, add a constant SSL_KEYSTORE_TYPE_DEFAULT, and javadoc with a pointer to what jks is.
+ String keystoreLocation = conf.get(
+ resolvePropertyName(mode, SSL_KEYSTORE_LOCATION_TPL), "");
+ keystorePassword = conf.get(
+ resolvePropertyName(mode, SSL_KEYSTORE_PASSWORD_TPL), "").toCharArray();
+
+ LOG.debug(mode.toString() + " KeyStore: " + keystoreLocation);
+
+ InputStream is = new FileInputStream(keystoreLocation);
If this property isn't set, you'll end up passing an empty string to the FileInputStream constructor, which will end up giving a hard-to-diagnose message. Check whether keystoreLocation.isEmpty() , and if it is, throw an appropriate exception including the config name.
Same goes for trustStoreLocation
Style nit: you have several javadocs for getters which are redundant, eg:
+ /**
+ * Returns the trustmanagers for trusted certificates.
+ *
+ * @ return the trustmanagers for trusted certificates.
+ */
No need to repeat yourself twice - just have the @return line in the javadoc and not the line above it, IMO.
+ * @param type type of truststore file, typically 'JKS'.
Elsewhere in the code you have "jks" (lower case). Is it case sensitive?
+ } catch (Exception ex) {
+ trustManagerRef.set( null );
+ LOG.warn( "Could not load truststore, using empty one : " + ex.toString(),
+ ex);
+ }
Why should you use an empty one? If the user configures a path to a trust store, and then starts up but the store can't be found, I don't think we should ignore their config. Better to bail out on startup. Then all of the null checks later on in this file could be removed.
+ FileInputStream in = new FileInputStream(file);
+ try {
+ ks.load(in, password.toCharArray());
+ lastLoaded = file.lastModified();
I think you need to set lastLoaded before opening the file. Otherwise there's a race where you can miss a change to the file.
+ } catch (Exception ex) {
+ throw new RuntimeException(ex);
+ }
Maybe use Throwables.propagateIfPossible here to propagate IOException and GeneralSecurityException first? Seems strange to throw RTE for an IOE when you declare that the method throws IOE.
+ @SuppressWarnings({ "InfiniteLoopStatement" })
+ public void run() {
There's really no way we can get a cleanup hook here to stop the thread at shutdown?
+ } catch (Exception ex) {
+ trustManagerRef.set( null );
+ LOG.warn( "Could not load truststore, using empty one : " +
+ ex.toString(), ex);
+ }
If it fails to reload, why not stick to the previous version of the reference instead of falling back to empty?
+ * This SSLFactory uses a {@link ReloadingX509TrustManager} intance,
+ * which reloads public keys if the truststore file changes.
+ * <p/>
+ * This factory is used to configure HTTPS in Hadoop HTTP based endpoints, both
+ * client & server.
Typo: 'intance'
Style: don't abbreviate the word "and" as '&' – it's invalid javadoc and also just harder to read.
+ public enum Mode { CLIENT, SERVER }
This can be static right? Also, since it's an inner class, you need an @InterfaceAudience.Private on it, too, or else it shows up in the public javadoc. (unfortunately the annotation doesn't get inherited from its outer class)
+ public static final String SSL_ENABLED =
+ "hadoop.ssl.enabled" ;
...
+ public static final String KEYSTORES_FACTORY_CLASS =
+ "hadoop.ssl.keystores.factory.class" ;
Style: rename all these constants to end in _KEY so it's clear it's the conf keys and not the values themselves.
+ Configuration sslConf = new Configuration( false );
+ sslConf.setBoolean(SSL_REQUIRE_CLIENT_CERT, requireClientCert);
+ String sslConfResource;
+ if (mode == Mode.CLIENT) {
+ sslConfResource = conf.get(SSL_CLIENT_CONF, "ssl-client.xml" );
+ } else {
+ sslConfResource = conf.get(SSL_SERVER_CONF, "ssl-server.xml" );
+ }
+ sslConf.addResource(sslConfResource);
Move this into a private method readSslConfiguration(mode) ? Also, indentation is off in one line here.
Extract the creation of SSLHostnameVerifier into a new method, as well.
+<property>
+ <name>hadoop.ssl.enabled</name>
+ <value> false </value>
+ <description>Whether encrypted shuffle is enabled</description>
+</property>
If this is specific to encrypted shuffle, the name should reflect that, and it should be in mapred-default.xml, not core-default.xml
I wonder: is there a use case for having this setting per-job in some clusters? Either way, it should definitely be an MR config and not a core config.
+ The keystores factory to use for retriving certificates.
Typo: retriving
+
+ public class KeyStoreUtil {
Rename to KeyStoreTestUtil, since this is a test-only class.
+ FileOutputStream out = new FileOutputStream(filename);
+ ks.store(out, password.toCharArray());
+ out.close();
Need try..finally. A few other places later in this same file that need this fix.
+ // Wait so that the file modification time is different
+ Thread .sleep((tm.getReloadInterval() + 2) * 1000);
You have this in a bunch of places in the test - but if you set the last modified time of the file, as you do elsewhere in the test, then you shouldn't have to sleep, except for waiting for it to notice the reload. If you change the reload interval to be specified in millis instead of seconds, then you could set it to 10ms or so for the tests and these tests would run a lot faster.
In TestSSLFactory, you use Assert.fail() in a lot of places after catching an Exception. Instead, just let the exception fall through which will fail the test, with the advantage that we'll actually have the stack trace of the exception instead of an unexplained failure message. In the cases where you expect an exception, use GenericTestUtils.assertExceptionContains to check the text.
+ writeFuture = ch.write( new ChunkedFile(spill, info.startOffset,
+ info.partLength, 8192));
What's 8192 here? Need a constant or config. If it's a buffer size, I'd think 64K or 128K would probably perform better, based on my general experience with java IO.
In the docs, under the ssh-client configuration, it references ssl-server.xml in one spot.
Typo: "trutsstore" in one place.
Typo: "will incurs in a significant"
General comment: what's the point of client certificates here? They're not a secret, since all users share them. I would think they'd need to be shipped with the job in the distributed-cache, if the use case is for cross-cluster authentication in tools like distcp, since different users may want to distcp from different clusters, and also have different access controls.

The javadoc style for 'Returns BLAH' and then '@return BLAH' is Sun javadoc sytle.

keystore type is case insensitive, 'jks' is the same as 'JKS'. Still I've lowercased that javadoc.

the ReloadingX509TrustManager will work with an empty keystore if the keystore file is not avail at initialization time, and if the keystore file becomes available later one, it will be loaded. WARNs are logged while the file is not present, so it won't go unnoticed.

added a init()/destroy() methods where appropriate to be able to shutdown the reload thread gracefully.

If reload() fails to reload the new keystore, it assumes there are not certs and runs empty until the next reload attempt. Seems a safer assumption that continuing running with obsolete keys.

While hadoop.ssl.enabled only applies to shuffle, the intention is to use it for the rest of the HTTP endpoints. Thus, a single know would enable SSL. That is why the name of the property and its location (in core-default.xml)

Regarding having it per job, This would require having shuffler serving both HTTP and HTTPS and denying the endpoint the job is not configured to use. This would require the shuffler to have access to that piece of job configuration. I'd say it is out of scope of this patch, and it could be a future improvement.

In the TestSSLFactory, the Assert.fail() statements, are sections the test should not make it; they are used for negative tests.

Client certs are disabled by default. If they are per job, yes they could be shipped via DC. This would require a alternate implementation of the KeyStoresFactory, thus the mechanism is already in place.

Alejandro Abdelnur
added a comment - 20/Jul/12 00:24 @todd, thanks for the detailed review.
I've integrated most of your comments.
The javadoc style for 'Returns BLAH' and then '@return BLAH' is Sun javadoc sytle.
keystore type is case insensitive, 'jks' is the same as 'JKS'. Still I've lowercased that javadoc.
the ReloadingX509TrustManager will work with an empty keystore if the keystore file is not avail at initialization time, and if the keystore file becomes available later one, it will be loaded. WARNs are logged while the file is not present, so it won't go unnoticed.
added a init()/destroy() methods where appropriate to be able to shutdown the reload thread gracefully.
If reload() fails to reload the new keystore, it assumes there are not certs and runs empty until the next reload attempt. Seems a safer assumption that continuing running with obsolete keys.
While hadoop.ssl.enabled only applies to shuffle, the intention is to use it for the rest of the HTTP endpoints. Thus, a single know would enable SSL. That is why the name of the property and its location (in core-default.xml)
Regarding having it per job, This would require having shuffler serving both HTTP and HTTPS and denying the endpoint the job is not configured to use. This would require the shuffler to have access to that piece of job configuration. I'd say it is out of scope of this patch, and it could be a future improvement.
In the TestSSLFactory, the Assert.fail() statements, are sections the test should not make it; they are used for negative tests.
Client certs are disabled by default. If they are per job, yes they could be shipped via DC. This would require a alternate implementation of the KeyStoresFactory, thus the mechanism is already in place.

The javadoc style for 'Returns BLAH' and then '@return BLAH' is Sun javadoc sytle.

Ew. That's disgusting. Oh well.

the ReloadingX509TrustManager will work with an empty keystore if the keystore file is not avail at initialization time, and if the keystore file becomes available later one, it will be loaded. WARNs are logged while the file is not present, so it won't go unnoticed.

WARNs in the logs are often not noticed. Don't you think it's simpler to just fail if the conf is not present? If someone configures this and doesn't create the file (or the file is unreadable due to a permissions error), I think it's friendlier to fail fast. Otherwise they'll just end up seeing strange downstream issues like client certs not being properly trusted, which will be more difficult to root-cause back to the trust store configuration without log spelunking.

If reload() fails to reload the new keystore, it assumes there are not certs and runs empty until the next reload attempt. Seems a safer assumption that continuing running with obsolete keys.

My worry here is that people might be using a conf management system to push out the key store files. If the reload happens to trigger right in the middle of a conf mgmt update, and the update is non-atomic, it will see an invalid keystore. I wouldn't want the TT to revert to an empty key store until the next reload interval in that case.

While hadoop.ssl.enabled only applies to shuffle, the intention is to use it for the rest of the HTTP endpoints. Thus, a single know would enable SSL. That is why the name of the property and its location (in core-default.xml)

Given it doesn't currently affect the other HTTP endpoints, I find this very confusing. Why not make a separate config for now, and then once it affects more than just the shuffle, you can change the default for mapred.shuffle.use.ssl to {{$

{hadoop.use.ssl}

}} to pick up the system-wide default.

In the TestSSLFactory, the Assert.fail() statements, are sections the test should not make it; they are used for negative tests.

I get that. But, if the test breaks, you'll end up with a meaningless failure, instead of a message explaining why it failed. If you let the exception fall through, then the failed unit test would actually have a stack trace that explains why it failed, which aids in debugging.

Client certs are disabled by default. If they are per job, yes they could be shipped via DC. This would require a alternate implementation of the KeyStoresFactory, thus the mechanism is already in place.

Does it need an alternate implementation? The distributed cache files can be put on the classpath already, in which case the existing keystore-loading code should be able to find them. The only change would be in the documentation – explaining that the client should ship the files via distributed cache rather than putting them in HADOOP_CONF_DIR. Why wouldn't that be enough?

Todd Lipcon
added a comment - 20/Jul/12 00:46 The javadoc style for 'Returns BLAH' and then '@return BLAH' is Sun javadoc sytle.
Ew. That's disgusting. Oh well.
the ReloadingX509TrustManager will work with an empty keystore if the keystore file is not avail at initialization time, and if the keystore file becomes available later one, it will be loaded. WARNs are logged while the file is not present, so it won't go unnoticed.
WARNs in the logs are often not noticed. Don't you think it's simpler to just fail if the conf is not present? If someone configures this and doesn't create the file (or the file is unreadable due to a permissions error), I think it's friendlier to fail fast. Otherwise they'll just end up seeing strange downstream issues like client certs not being properly trusted, which will be more difficult to root-cause back to the trust store configuration without log spelunking.
If reload() fails to reload the new keystore, it assumes there are not certs and runs empty until the next reload attempt. Seems a safer assumption that continuing running with obsolete keys.
My worry here is that people might be using a conf management system to push out the key store files. If the reload happens to trigger right in the middle of a conf mgmt update, and the update is non-atomic, it will see an invalid keystore. I wouldn't want the TT to revert to an empty key store until the next reload interval in that case.
While hadoop.ssl.enabled only applies to shuffle, the intention is to use it for the rest of the HTTP endpoints. Thus, a single know would enable SSL. That is why the name of the property and its location (in core-default.xml)
Given it doesn't currently affect the other HTTP endpoints, I find this very confusing. Why not make a separate config for now, and then once it affects more than just the shuffle, you can change the default for mapred.shuffle.use.ssl to {{$
{hadoop.use.ssl}
}} to pick up the system-wide default.
In the TestSSLFactory, the Assert.fail() statements, are sections the test should not make it; they are used for negative tests.
I get that. But, if the test breaks, you'll end up with a meaningless failure, instead of a message explaining why it failed. If you let the exception fall through, then the failed unit test would actually have a stack trace that explains why it failed, which aids in debugging.
Client certs are disabled by default. If they are per job, yes they could be shipped via DC. This would require a alternate implementation of the KeyStoresFactory, thus the mechanism is already in place.
Does it need an alternate implementation? The distributed cache files can be put on the classpath already, in which case the existing keystore-loading code should be able to find them. The only change would be in the documentation – explaining that the client should ship the files via distributed cache rather than putting them in HADOOP_CONF_DIR. Why wouldn't that be enough?

The AtomicBoolean running doesn't need to be an atomic boolean, since it's already volatile. You can just use a volatile boolean here.

in loadTrustManager, you need to get file.lastModified before you even open the FileInputStream. Otherwise if the file is replaced in between opening the stream and you getting the mtime, you'll read the old version of the file but think you read the new one (assuming an atomic rename-over-old-file replacement)

Todd Lipcon
added a comment - 20/Jul/12 19:52
The AtomicBoolean running doesn't need to be an atomic boolean, since it's already volatile. You can just use a volatile boolean here.
in loadTrustManager , you need to get file.lastModified before you even open the FileInputStream . Otherwise if the file is replaced in between opening the stream and you getting the mtime, you'll read the old version of the file but think you read the new one (assuming an atomic rename-over-old-file replacement)
Otherwise looks good to me.

Alejandro Abdelnur
added a comment - 23/Jul/12 18:06 The patch for branch-1, opens an additional port with SSL just for encrypted shuffle, the shuffle servlet (MapOutputServlet) refuses to serve shuffle over the clear HTTP endpoint if SSL is enable:
if (shuffleSsl && !request.isSecure()) {
response.sendError(HttpServletResponse.SC_FORBIDDEN,
"Encrypted Shuffle is enabled, shuffle is only served over HTTPS" );
return ;
}

Why would we add this to 1? I understand mainline, but not anything else at this point. I don't think this is a complete approach. It's going to take additional work to finish this. Why add the complexity to a stabilized code line?

eric baldeschwieler
added a comment - 23/Jul/12 20:02 Why would we add this to 1? I understand mainline, but not anything else at this point. I don't think this is a complete approach. It's going to take additional work to finish this. Why add the complexity to a stabilized code line?

Alejandro Abdelnur
added a comment - 23/Jul/12 20:45 @eric14, you are correct, on its own is an incomplete feature and it will require more work. I've posted the patch for branch-1 in case somebody is interested.

Todd Lipcon
added a comment - 23/Jul/12 23:24 PS I thought we had a process for adding things to 1, which was to propose them during next release planning.
That process was proposed but it hasn't been followed at all. There have been plenty of new features going into branch-1 without prior discussion.