Implement the rest (writable APIs) of RM web-services

Details

Type: New Feature

Status:Open

Priority: Major

Resolution:
Unresolved

Affects Version/s:
None

Fix Version/s:
None

Component/s:
None

Labels:

None

Description

MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs added there were only focused on obtaining information from the cluster. We need to have the following REST APIs to finish the feature

Application Client protocol: For resource scheduling by apps written in an arbitrary language. Will have to think about throughput concerns

ContainerManagement Protocol: Again for arbitrary language apps.

One important thing to note here is that we already have client libraries on all the three protocols that do some some heavy-lifting. One part of the effort is to figure out if they can be made any thinner and/or how web-services will implement the same functionality.

Activity

To be clear, the primary focus of this JIRA is definitely the application submission so that tools outside YARN can easily submit and monitor applications without writing against the JAVA or PB interfaces.

The later two protocols arguably are not that important to have REST APIs given we already have a cross-platform story in terms of protocol-buffers based interfaces.

Vinod Kumar Vavilapalli
added a comment - 10/Feb/14 19:16 To be clear, the primary focus of this JIRA is definitely the application submission so that tools outside YARN can easily submit and monitor applications without writing against the JAVA or PB interfaces.
The later two protocols arguably are not that important to have REST APIs given we already have a cross-platform story in terms of protocol-buffers based interfaces.

Before we start coding work on this, it would be great to see how security will handled (authentication, ACLs, tokens, etc).

I'm a bit a concern about introducing a second protocol everywhere. From a maintenance and security risk point of view is doubling our development/support efforts. Granted, HDFS offers data over RPC and HTTP. But, HTTP, when using HttpFS (how I recommend using it HTTP access) is a gateway that ends up doing RPC to HDFS. Thus the only protocol accessing HDFS is RPC.

Have we considered having a C implementation of Hadoop RPC, with the multiplatform support of protobuffers that may give us the multi-platform support we are aiming to with a single protocol interface.

Alejandro Abdelnur
added a comment - 11/Feb/14 18:34 Before we start coding work on this, it would be great to see how security will handled (authentication, ACLs, tokens, etc).
I'm a bit a concern about introducing a second protocol everywhere. From a maintenance and security risk point of view is doubling our development/support efforts. Granted, HDFS offers data over RPC and HTTP. But, HTTP, when using HttpFS (how I recommend using it HTTP access) is a gateway that ends up doing RPC to HDFS. Thus the only protocol accessing HDFS is RPC.
Have we considered having a C implementation of Hadoop RPC, with the multiplatform support of protobuffers that may give us the multi-platform support we are aiming to with a single protocol interface.
Thoughts?

We already have a clear security story for web UI and web-services. This JIRA is not adding any more end-points than what are already present.

Similarly, we already expose the same information via RPCs, web UI and web-services. So this JIRA isn't adding any more maintenance burden than is already present.

Overall, I've seen a lot of use-cases where, at-least on the client side APIs, users want to use web-services directly instead of writing code. That's the goal of this JIRA. As I already pointed out, the remaining protocols from this JIRA are in the grey area of whether to do or not at all and which I am not focusing on at all right-away.

Vinod Kumar Vavilapalli
added a comment - 21/Feb/14 19:58 I missed your comment, apologies.
We already have a clear security story for web UI and web-services. This JIRA is not adding any more end-points than what are already present.
Similarly, we already expose the same information via RPCs, web UI and web-services. So this JIRA isn't adding any more maintenance burden than is already present.
Overall, I've seen a lot of use-cases where, at-least on the client side APIs, users want to use web-services directly instead of writing code. That's the goal of this JIRA. As I already pointed out, the remaining protocols from this JIRA are in the grey area of whether to do or not at all and which I am not focusing on at all right-away.

There is a related question here - whether to implement web-services inside RM or pull them out and implement them in a proxy layer similar to WebHDFS. But that's a larger issue with code and feature-set that is already present today. Will create a separate ticket for that discussion.

Vinod Kumar Vavilapalli
added a comment - 21/Feb/14 19:59 There is a related question here - whether to implement web-services inside RM or pull them out and implement them in a proxy layer similar to WebHDFS. But that's a larger issue with code and feature-set that is already present today. Will create a separate ticket for that discussion.

Steve Loughran
added a comment - 10/Dec/14 14:51 Now that the functionality is in, is there going to be a replacement for YarnClientImpl which we can use to switch to the REST APIs? I can't see an issue for it.