Details

Description

ATSv2 model lacks retrieval of list-of-all-apps, list-of-all-app-attempts and list-of-all-containers-per-attempt via REST API's. And also it is required to know about all the entities in an applications.

It is pretty much highly required these URLs for Web UI.
New REST URL would be

GET /ws/v2/timeline/apps

GET /ws/v2/timeline/apps/{app-id}/appattempts.

GET /ws/v2/timeline/apps/{app-id}/appattempts/{attempt-id}/containers

GET /ws/v2/timeline/apps/{app id}/entities should display list of entities that can be queried.

Vrushali C
added a comment - 25/Aug/16 18:15 + 1 on adding more rest endpoints. It makes things easier to query/script for.
As such in the list above, perhaps 2 and 3 can be combined such that 3 becomes query params in 2.
As discussed in today's call, this jira might help with YARN-4343 .

Looking into code base for reader for supporting these REST end points. I see that ApplicationTable schema is defined with row key clusterId!userName!flowName!flowRunId!AppId.

/apps without userId and flowName should be able to read. Currently it is mandatory that to read apps to provide userId and flowName.

/apps/{app-id}/app-attempts can be redirect to existing getEntity for YARN-APPLICATIOIN-ENTITY.

/apps/{app-id}/app-attempt/{app-attempt-id}/containers. As per design, YARN_APPLICATION_ATTEMPT and YARN_CONTAINERS are stored as general entities. Reading will be same as others user entities. I think there should be another hbase table that stores YARN specific entities. Otherwise it is big overhead for clients to parse output to know about its parent entity.

Rohith Sharma K S
added a comment - 29/Aug/16 06:52 Looking into code base for reader for supporting these REST end points. I see that ApplicationTable schema is defined with row key clusterId!userName!flowName!flowRunId!AppId .
/apps without userId and flowName should be able to read. Currently it is mandatory that to read apps to provide userId and flowName.
/apps/{app-id}/app-attempts can be redirect to existing getEntity for YARN-APPLICATIOIN-ENTITY.
/apps/{app-id}/app-attempt/{app-attempt-id}/containers . As per design, YARN_APPLICATION_ATTEMPT and YARN_CONTAINERS are stored as general entities. Reading will be same as others user entities. I think there should be another hbase table that stores YARN specific entities. Otherwise it is big overhead for clients to parse output to know about its parent entity.

Rohith Sharma K S
added a comment - 29/Aug/16 07:03 Other option is publish the entities as string rather than object. The below one is info value holds as object.
"SYSTEM_INFO_PARENT_ENTITY" : {
"type" : "YARN_APPLICATION" ,
"id" : "application_1471931266232_0024"
}
This can split into below which is very much useful in filtering.
"SYSTEM_INFO_PARENT_ENTITY_TYPE" : "YARN_APPLICATION"
"SYSTEM_INFO_PARENT_ENTITY_ID" : "application_1471931266232_0024"

If we have to support fetching all the apps within a cluster, upto a limit, this currently is not advisable in HBase implementation as it would lead to a full table scan or a very large scan. Because the API guarantees returning entities in a descending order by created time. This would mean that we will go through each row to determine the result set.

It is about taking one of the choices. Do we want to drill down to apps via flows or list all the apps in a cluster.

We can discuss further in the team on this, depending on the use case. And see if we can have some workaround.

Varun Saxena
added a comment - 29/Aug/16 16:27 Copying my comment over from YARN-5571 .
If we have to support fetching all the apps within a cluster, upto a limit, this currently is not advisable in HBase implementation as it would lead to a full table scan or a very large scan. Because the API guarantees returning entities in a descending order by created time. This would mean that we will go through each row to determine the result set.
It is about taking one of the choices. Do we want to drill down to apps via flows or list all the apps in a cluster.
We can discuss further in the team on this, depending on the use case. And see if we can have some workaround.

It is required to get all the apps in cluster. REST client will ask for limit and fromId. FromId is something which filter need to support so from where to retrieve the apps(not there now). For example , app-1 to app-10 are there. So if fromId=app-5 then apps app-6 to app-10 should be retrieved. Is there any other way to achieve this now?

If merely list of apps within a cluster is required(without any other information) we can probably serve this from app to flow table.

Not only app id's but also other information required. And currently the way I see is need to get all the flows first and for each flows, retrieve apps. This is something where client need to lot of REST calls constructing for each flows.

Rohith Sharma K S
added a comment - 29/Aug/16 17:17 It is required to get all the apps in cluster. REST client will ask for limit and fromId. FromId is something which filter need to support so from where to retrieve the apps(not there now). For example , app-1 to app-10 are there. So if fromId=app-5 then apps app-6 to app-10 should be retrieved. Is there any other way to achieve this now?
If merely list of apps within a cluster is required(without any other information) we can probably serve this from app to flow table.
Not only app id's but also other information required. And currently the way I see is need to get all the flows first and for each flows, retrieve apps. This is something where client need to lot of REST calls constructing for each flows.

One potential problem of returning all apps in a cluster/containers in an app attempt is that they may easily cause severe server slowness if not used properly. In the past this became a great pain when Apache Ambari called the default endpoint of AHS, when the total number of applications is extremely large. I agree it is quite desirable to have the ability to "list all", but at the same time we may want to consider restrict the number of returned entities unless specified by the caller, or support paging, or both?

Li Lu
added a comment - 29/Aug/16 17:53 One potential problem of returning all apps in a cluster/containers in an app attempt is that they may easily cause severe server slowness if not used properly. In the past this became a great pain when Apache Ambari called the default endpoint of AHS, when the total number of applications is extremely large. I agree it is quite desirable to have the ability to "list all", but at the same time we may want to consider restrict the number of returned entities unless specified by the caller, or support paging, or both?

FromId is something which filter need to support so from where to retrieve the apps(not there now). For example , app-1 to app-10 are there. So if fromId=app-5 then apps app-6 to app-10 should be retrieved. Is there any other way to achieve this now?

Currently not supported. What's the use case for this by the way ?
This can be supported easily though, within the scope of a flow.

Varun Saxena
added a comment - 29/Aug/16 17:59 FromId is something which filter need to support so from where to retrieve the apps(not there now). For example , app-1 to app-10 are there. So if fromId=app-5 then apps app-6 to app-10 should be retrieved. Is there any other way to achieve this now?
Currently not supported. What's the use case for this by the way ?
This can be supported easily though, within the scope of a flow.

Li Lu
added a comment - 29/Aug/16 18:20 Ah I just checked the v0 patch, the current approach to keep default limit to be 100 unless specified seems fine. Is there a way to specify "unlimit" (although not encouraged)? Thanks!

Rohith Sharma K S
added a comment - 30/Aug/16 02:32 Currently not supported. What's the use case for this by the way ?
This is required for pagination. And more importantly fromId is must for support for pagination. And also limit & fromId should be there in all the REST end points.

Rohith Sharma K S
added a comment - 30/Aug/16 10:05 Attached the patch for /containers and /appattempts. These are similar to retrieving an general entity. So full table scan do not happen. Tested the APIs in real cluster and verified.

Thanks Rohith Sharma K S! I took a deeper look into the patch. The overall approach LGTM. One concern I have is, shall we expose the semantics of "containers" and "appattempts" in timeline reader API level? To me, the timeline API is independent from the actual use cases: it merely provides concrete ways to retrieve timeline entities. Similar to the past AHS APIs, maybe we'd like to abstract a separate layer of APIs for YARN to read data from timeline service? Of course we can integrate the new "AHS REST API" layer in the timeline reader server, but code-wise shall we separate the new APIs from TimelineReaderWebService?

BTW, I set this JIRA to be patch available so that Jenkins can find the patch and test it. If you want something otherwise please feel free to change it back. Thanks!

Li Lu
added a comment - 30/Aug/16 22:10 Thanks Rohith Sharma K S ! I took a deeper look into the patch. The overall approach LGTM. One concern I have is, shall we expose the semantics of "containers" and "appattempts" in timeline reader API level? To me, the timeline API is independent from the actual use cases: it merely provides concrete ways to retrieve timeline entities. Similar to the past AHS APIs, maybe we'd like to abstract a separate layer of APIs for YARN to read data from timeline service? Of course we can integrate the new "AHS REST API" layer in the timeline reader server, but code-wise shall we separate the new APIs from TimelineReaderWebService?
BTW, I set this JIRA to be patch available so that Jenkins can find the patch and test it. If you want something otherwise please feel free to change it back. Thanks!

One concern I have is, shall we expose the semantics of "containers" and "appattempts" in timeline reader API level?

Sorry could not understand this fully. I assume that you are asking to create another web service with different path like AHSWebService which internally redirect the calls to getEntites in TimeLineReaderWebApp. So there will be two @Path i.e /ws/v2/timeline and /ws/v2/applicationhistory exposed in ATSv2. Is it ?

Similar to the past AHS APIs, maybe we'd like to abstract a separate layer of APIs for YARN to read data from timeline service?

Rohith Sharma K S
added a comment - 31/Aug/16 11:26 One concern I have is, shall we expose the semantics of "containers" and "appattempts" in timeline reader API level?
Sorry could not understand this fully. I assume that you are asking to create another web service with different path like AHSWebService which internally redirect the calls to getEntites in TimeLineReaderWebApp. So there will be two @Path i.e /ws/v2/timeline and /ws/v2/applicationhistory exposed in ATSv2. Is it ?
Similar to the past AHS APIs, maybe we'd like to abstract a separate layer of APIs for YARN to read data from timeline service?
Sorry I could not get it.

OK let me clarify: IMO the reader API of YARN timeline service should focus on serving timeline entities according to caller's request, but not on how to serve YARN specific use cases. To the storage layer of timeline service, requesting "container info" should be similar to requesting distributed shell application information or Tez job information. I noticed that in this patch, we're passing some predefined constants, like:

String entityType = TimelineEntityType.YARN_CONTAINER.toString();

This will query for a specific type of timeline entities. We may want to provide a different endpoint (like /ws/v2/applicationhistory) to support this YARN specific use case.

In v1, we have AHSWebServices to support YARN specific application history information. Maybe we would like to keep the same way?

This is my own (and subjective) idea. Feel free to let me know if you noticed some critical things I'm missing... Thanks!

Li Lu
added a comment - 31/Aug/16 18:25 OK let me clarify: IMO the reader API of YARN timeline service should focus on serving timeline entities according to caller's request, but not on how to serve YARN specific use cases. To the storage layer of timeline service, requesting "container info" should be similar to requesting distributed shell application information or Tez job information. I noticed that in this patch, we're passing some predefined constants, like:
String entityType = TimelineEntityType.YARN_CONTAINER.toString();
This will query for a specific type of timeline entities. We may want to provide a different endpoint (like /ws/v2/applicationhistory) to support this YARN specific use case.
In v1, we have AHSWebServices to support YARN specific application history information. Maybe we would like to keep the same way?
This is my own (and subjective) idea. Feel free to let me know if you noticed some critical things I'm missing... Thanks!

Separating out YARN specific details is good idea similar to v1. Here is my vote will be 50-50 for this approach.

In v1, entities were fully separated out from yarn specific details. But in v2, Apart from the entities, Query Apps for a Flow and Query Apps for a Flow Run and other details are in TimelineReaderWebService. These are belongs to YARN specific details nevertheless of any underlying storage schema. All the entities are published under application scope which makes decision harder for devs to adding a new REST YARN specific end points.

From the user perspective, I want to share you that say for retrieving all the apps with flow/flowrun uses path /ws/v2/timeline, but for retrieving attempts uses path /ws/v2/applicationhistory would lead to big question for users why there are 2 different Path for same application details!!!.

Rohith Sharma K S
added a comment - 01/Sep/16 04:48 Separating out YARN specific details is good idea similar to v1. Here is my vote will be 50-50 for this approach.
In v1, entities were fully separated out from yarn specific details. But in v2, Apart from the entities, Query Apps for a Flow and Query Apps for a Flow Run and other details are in TimelineReaderWebService. These are belongs to YARN specific details nevertheless of any underlying storage schema. All the entities are published under application scope which makes decision harder for devs to adding a new REST YARN specific end points.
From the user perspective, I want to share you that say for retrieving all the apps with flow/flowrun uses path /ws/v2/timeline, but for retrieving attempts uses path /ws/v2/applicationhistory would lead to big question for users why there are 2 different Path for same application details!!!.
May be we can takes other folks thoughts too on this.

I've got offline discussions with several folks on if we should have concepts like "app-attempt" and "container". From a designer's perspective, app-attempts and containers should not be included in timeline APIs, but from YARN users perspective, requesting app-attempt and container level information seems to be very natural operations, especially since both concepts are top level concepts in YARN. So I'm relatively fine with having terms like "containers" and "app-attempts" exposed in timeline APIs, but we may want to be very careful to not to give an impression that attempts and containers are on the hierarchical order as flows and flowruns.

Li Lu
added a comment - 02/Sep/16 00:21 I've got offline discussions with several folks on if we should have concepts like "app-attempt" and "container". From a designer's perspective, app-attempts and containers should not be included in timeline APIs, but from YARN users perspective, requesting app-attempt and container level information seems to be very natural operations, especially since both concepts are top level concepts in YARN. So I'm relatively fine with having terms like "containers" and "app-attempts" exposed in timeline APIs, but we may want to be very careful to not to give an impression that attempts and containers are on the hierarchical order as flows and flowruns.
So how about having two different hierarchical orders:
Order 1, native timeline order: cluster, user, flow, flow-run, application, entity
Order 2, YARN application order: application, app-attempt, container
Once we're not mixing the two orders in APIs, the logic should be clear. Thoughts?

So the new addition looks fine to me. Do we want to reorganize the code in a way consistent with this list? Right now the code seems to be a little bit messy. We can do it in this JIRA, or we can open a new JIRA to reorganize these APIs and discuss the endpoints that marked as weird? Thanks!

Li Lu
added a comment - 08/Sep/16 01:03 Sequence 2 in this list seems to be replacing the majority part of the old AHS APIs, except for container logs. We may use them as a starting point to support AHS-like use cases in timeline v2.

Well the current organization is based on what we are retrieving. That is, all endpoints for fetching entities are together, for fetching apps are together and so on.
We can follow approach suggested by you as well. I do not have a strong opinion on either. So I will leave it as it is. Lets see what others think.

discuss the endpoints that marked as weird

These endpoints were added to get all apps belonging to a flow so we skip the flow run section. There were use cases to fetch all apps within a flow in case run id is not known. Refer to Vrushali C's comment on YARN-3864
We also plan to list all apps for a user or queue in future as well.
And based on use case of Rohith maybe list all apps within a cluster as well. However in my personal opinion that may not be necessary. You can check with New Web UI folks.

Varun Saxena
added a comment - 08/Sep/16 06:13 - edited So the new addition looks fine to me.
Looks fine to me as well.
Do we want to reorganize the code in a way consistent with this list?
Well the current organization is based on what we are retrieving. That is, all endpoints for fetching entities are together, for fetching apps are together and so on.
We can follow approach suggested by you as well. I do not have a strong opinion on either. So I will leave it as it is. Lets see what others think.
discuss the endpoints that marked as weird
These endpoints were added to get all apps belonging to a flow so we skip the flow run section. There were use cases to fetch all apps within a flow in case run id is not known. Refer to Vrushali C 's comment on YARN-3864
We also plan to list all apps for a user or queue in future as well.
And based on use case of Rohith maybe list all apps within a cluster as well. However in my personal opinion that may not be necessary. You can check with New Web UI folks.

Li Lu
added a comment - 09/Sep/16 00:46 We can follow approach suggested by you as well. I do not have a strong opinion on either. So I will leave it as it is. Lets see what others think.
Fine with me. Once we have a clear view about the current rest end points and they're not confusing we're good.
And based on use case of Rohith maybe list all apps within a cluster as well. However in my personal opinion that may not be necessary.
Nice catch. Rohith Sharma K S if you feel we need this endpoint, please feel free to refresh the patch.
Generally LGTM. I'll wait ~24 hrs and then commit the patch.

In Li lu's REST endpoint list, there is one end point which is missed i.e listing all the apps for given flow name /users/{userid}/flows/{flowname}/apps/. This also use full API

And based on use case of Rohith maybe list all apps within a cluster as well.

This is one of the objective of this JIRA when I reported. But there is another end point from which apps can be retrieved i.e at flow name, the client can get all the flows in the cluster and for each flow, client can retrieve the apps using above specified path. So, let me confirm with Sunil does new UI has such use case to get all the apps per cluster. If so, I will refresh the patch.
cc:/ Sunil G

Rohith Sharma K S
added a comment - 12/Sep/16 03:26 - edited In Li lu 's REST endpoint list, there is one end point which is missed i.e listing all the apps for given flow name /users/{userid}/flows/{flowname}/apps/ . This also use full API
And based on use case of Rohith maybe list all apps within a cluster as well.
This is one of the objective of this JIRA when I reported. But there is another end point from which apps can be retrieved i.e at flow name, the client can get all the flows in the cluster and for each flow, client can retrieve the apps using above specified path. So, let me confirm with Sunil does new UI has such use case to get all the apps per cluster. If so, I will refresh the patch.
cc:/ Sunil G

Yes. We are looking for a complete applications page where all applications which were running/completed had to be listed. For this purpose, I think we need the api as suggested by Rohith. Being said this, We will also be showing hierarchy from flows too. Once end use land on this applications page, various filters/views could be derived. Hence we could also cover or show details of flows.

Sunil G
added a comment - 12/Sep/16 03:53 Hi Rohith Sharma K S
Yes. We are looking for a complete applications page where all applications which were running/completed had to be listed. For this purpose, I think we need the api as suggested by Rohith. Being said this, We will also be showing hierarchy from flows too. Once end use land on this applications page, various filters/views could be derived. Hence we could also cover or show details of flows.

are looking for a complete applications page where all applications which were running/completed had to be listed. For this purpose, I think we need the api as suggested by Rohith. Being said this, We will also be showing hierarchy from flows too.

So you plan to have 2 app pages. One from a specific flow run and other a list of all the apps in a cluster. Right ?Rohith Sharma K S, how do you plan to support fetching all apps within a cluster ?
Probably you can adopt the approach I had suggested. Because otherwise it would lead to full table scan.

Varun Saxena
added a comment - 12/Sep/16 06:42 are looking for a complete applications page where all applications which were running/completed had to be listed. For this purpose, I think we need the api as suggested by Rohith. Being said this, We will also be showing hierarchy from flows too.
So you plan to have 2 app pages. One from a specific flow run and other a list of all the apps in a cluster. Right ?
Rohith Sharma K S , how do you plan to support fetching all apps within a cluster ?
Probably you can adopt the approach I had suggested. Because otherwise it would lead to full table scan.
New API's required. Thoughts?
We should have them for the sake of completeness.

are looking for a complete applications page where all applications which were running/completed had to be listed.

This is something I'm not sure if we'd like to support on the timeline level. This information is pretty much available on the RM side via the state store. Why do we want to encourage an expensive operation on the timeline store for this data?

Li Lu
added a comment - 12/Sep/16 20:49 Thanks Rohith Sharma K S , the two proposed APIs make the app - attempt - container APIs more comprehensive. LGTM.
are looking for a complete applications page where all applications which were running/completed had to be listed.
This is something I'm not sure if we'd like to support on the timeline level. This information is pretty much available on the RM side via the state store. Why do we want to encourage an expensive operation on the timeline store for this data?

Updated the patch adding 2 more REST endpoints.
One small difference is there in the REST endpoints for container i.e instead of /clusters/{clusterid}/apps/{appid}/appattempts/{appattemptid}/containers/{contianer-id}, I have changed to /clusters/{clusterid}/apps/{appid}/containers/{contianer-id}.

Rohith Sharma K S
added a comment - 14/Sep/16 11:39 Updated the patch adding 2 more REST endpoints.
One small difference is there in the REST endpoints for container i.e instead of /clusters/{clusterid}/apps/{appid}/appattempts/{appattemptid}/containers/{contianer-id} , I have changed to /clusters/{clusterid}/apps/{appid}/containers/{contianer-id} .

I'm generally fine with the newly proposed REST end points (as they seem to be more like short-handed API for existing ones).

Regarding the need for listing all apps (in a cluster), I agree with others that it would be quite problematic. If I read this thread correctly, Rohith, I think you're saying you are OK without having that API since we have a query that returns all apps for a flow (and one that returns all flow activities).

Sangjin Lee
added a comment - 23/Sep/16 23:40 Sorry it took me a while to get to this.
I'm generally fine with the newly proposed REST end points (as they seem to be more like short-handed API for existing ones).
Regarding the need for listing all apps (in a cluster), I agree with others that it would be quite problematic. If I read this thread correctly, Rohith, I think you're saying you are OK without having that API since we have a query that returns all apps for a flow (and one that returns all flow activities).

Rohith Sharma K S
added a comment - 26/Sep/16 06:03 Rohith, I think you're saying you are OK without having that API since we have a query that returns all apps for a flow
RIght, I am fine with all apps for a flow.

Li Lu
added a comment - 04/Oct/16 19:10 Seems like we're reaching much agreement on the general proposal of this JIRA. Rohith Sharma K S would you please update the patch so that we can move forward with the rest process? Thanks!

Varun Saxena
added a comment - 06/Oct/16 16:16 Thanks Rohith Sharma K S for the patch.
We do not store configurations at container and app attempt level. So we can leave out query params associated with it,
Similarly no metrics are stored at app attempt level. Same goes for relationships. We can add them when we support them, if not for anything but just to reduce lines in code and documentation
Javadoc will have to be updated I think otherwise we will get a -1 in QA report. Yes this will unnecessarily increase code size but then there is no other way out.
Update tests for app attempts too ?

bq We do not store configurations at container and app attempt level. So we can leave out query params associated with it,

Similarly no metrics are stored at app attempt level. Same goes for relationships. We can add them when we support them, if not for anything but just to reduce lines in code and documentation

That said, these REST end points are enhancement of getEntitties and getEntity. So, do not want to change any of the query params associated with app-attempts and container. In future, we never know what metrics/configs/relationships would be stored in attempts/containers. Lets keep it as it, if user query configs/metrics then he gets result in empty. It indicates , there is no metrics or configs published.

Javadoc will have to be updated I think otherwise we will get a -1 in QA report. Yes this will unnecessarily increase code size but then there is no other way out.

This is one of point of concern for code readability. May be we can capture in document, but not in java class file. Thought?

Rohith Sharma K S
added a comment - 07/Oct/16 12:09 bq We do not store configurations at container and app attempt level. So we can leave out query params associated with it,
Similarly no metrics are stored at app attempt level. Same goes for relationships. We can add them when we support them, if not for anything but just to reduce lines in code and documentation
That said, these REST end points are enhancement of getEntitties and getEntity. So, do not want to change any of the query params associated with app-attempts and container. In future, we never know what metrics/configs/relationships would be stored in attempts/containers. Lets keep it as it, if user query configs/metrics then he gets result in empty. It indicates , there is no metrics or configs published.
Javadoc will have to be updated I think otherwise we will get a -1 in QA report. Yes this will unnecessarily increase code size but then there is no other way out.
This is one of point of concern for code readability. May be we can capture in document, but not in java class file. Thought?
Update tests for app attempts too ?
Let me update the tests and will update new patch.

That said, these REST end points are enhancement of getEntitties and getEntity. So, do not want to change any of the query params associated with app-attempts and container. In future, we never know what metrics/configs/relationships would be stored in attempts/containers.

Well the intention behind my comment was that we will document these REST endpoints separately. So for the user its a distinct endpoint. The fact that we in turn redirect this call to entity table is internal implementation.
And if we indicate that you can apply config filters etc., it may indicate that we store them for containers, app attempts which we don't. And if there is something which we don't publish from RM/NM, I thought its pointless to have filter for it. Your thoughts on the same ?
We currently do not restrict clients to publish even YARN specific entities so they can potentially publish configs for it but I remember we had a JIRA related to that i.e. do not allow arbitrary clients to publish YARN entities.
I do not have a very strong opinion on this though. Let us see what others think.

This is one of point of concern for code readability. May be we can capture in document, but not in java class file. Thought?

We had initially decided we will add both javadoc and documentation. We however want to trim down the documentation a bit because some query params are repeated.
Let us keep it consistent no matter what we do. We can probably add non-javadoc comments over REST methods so that developer can read them. As such REST endpoints are already captured in documentation for outside users of ATS. I remember Rohith Sharma K S you had a JIRA for it. Maybe we can handle it there if the team reaches a consensus on this. Sangjin Lee, your opinion on this ?

Varun Saxena
added a comment - 07/Oct/16 12:36 That said, these REST end points are enhancement of getEntitties and getEntity. So, do not want to change any of the query params associated with app-attempts and container. In future, we never know what metrics/configs/relationships would be stored in attempts/containers.
Well the intention behind my comment was that we will document these REST endpoints separately. So for the user its a distinct endpoint. The fact that we in turn redirect this call to entity table is internal implementation.
And if we indicate that you can apply config filters etc., it may indicate that we store them for containers, app attempts which we don't. And if there is something which we don't publish from RM/NM, I thought its pointless to have filter for it. Your thoughts on the same ?
We currently do not restrict clients to publish even YARN specific entities so they can potentially publish configs for it but I remember we had a JIRA related to that i.e. do not allow arbitrary clients to publish YARN entities.
I do not have a very strong opinion on this though. Let us see what others think.
This is one of point of concern for code readability. May be we can capture in document, but not in java class file. Thought?
We had initially decided we will add both javadoc and documentation. We however want to trim down the documentation a bit because some query params are repeated.
Let us keep it consistent no matter what we do. We can probably add non-javadoc comments over REST methods so that developer can read them. As such REST endpoints are already captured in documentation for outside users of ATS. I remember Rohith Sharma K S you had a JIRA for it. Maybe we can handle it there if the team reaches a consensus on this. Sangjin Lee , your opinion on this ?

That said, these REST end points are enhancement of getEntitties and getEntity. So, do not want to change any of the query params associated with app-attempts and container. In future, we never know what metrics/configs/relationships would be stored in attempts/containers.

Regarding whether to retain things that are not needed for app attempts and so on, I don't have a strong opinion either way, but I might lean slightly towards leaving them in. Even with the existing queries for them today, we don't strongly reject them even if they are specified for app attempts but simply return empty data, right? Also, leaving them in might be slightly more future-proof. Again these are not strong preferences.

in terms of documenting, let's just be consistent and add the javadoc. Being consistent is probably a good thing here. If we decide later that javadoc is superfluous, we could take them out in bulk.

Rohith Sharma K S, does this need to go into trunk or can it stay in YARN-5355? If it is needed on trunk (which is the impression I got), then let's change it to a top level JIRA and create a patch based on the trunk. Once it is committed to the trunk, we can cherry-pick it onto YARN-5355 (and its associated branch-2 branch).

Sangjin Lee
added a comment - 07/Oct/16 18:16
That said, these REST end points are enhancement of getEntitties and getEntity. So, do not want to change any of the query params associated with app-attempts and container. In future, we never know what metrics/configs/relationships would be stored in attempts/containers.
Regarding whether to retain things that are not needed for app attempts and so on, I don't have a strong opinion either way, but I might lean slightly towards leaving them in. Even with the existing queries for them today, we don't strongly reject them even if they are specified for app attempts but simply return empty data, right? Also, leaving them in might be slightly more future-proof. Again these are not strong preferences.
in terms of documenting, let's just be consistent and add the javadoc. Being consistent is probably a good thing here. If we decide later that javadoc is superfluous, we could take them out in bulk.
Rohith Sharma K S , does this need to go into trunk or can it stay in YARN-5355 ? If it is needed on trunk (which is the impression I got), then let's change it to a top level JIRA and create a patch based on the trunk. Once it is committed to the trunk, we can cherry-pick it onto YARN-5355 (and its associated branch-2 branch).

I don't have a strong preference on those query parameters either, but I think adding them is totally fine. The latest patch generally LGTM. Rohith Sharma K S would you like to rebase it to trunk and fix the few "longer than 80 chars" warnings so that we can commit this? Thanks!

Li Lu
added a comment - 10/Oct/16 20:57 I don't have a strong preference on those query parameters either, but I think adding them is totally fine. The latest patch generally LGTM. Rohith Sharma K S would you like to rebase it to trunk and fix the few "longer than 80 chars" warnings so that we can commit this? Thanks!

On second thought, ultimate goal for YARN-5561 and YARN-5699 is to provide an easy way for YARN Web UI. It would be much helpful for YARN Web UI and also for the users who wants query containers/app-attempts from CLI if app-attempts/containers are in the form of ApplicationAttemptReport and ContainerReport format.

Basically overall intention is to provide flexible REST API to users with which they can directly make use from response output rather than allowing them to parse the ATSv2 REST output.

I think it would be good to take early decision rather than struggling at final stages which leads to many code changes in dependencies especialy YARN Web UI. Couple of approaches for providing a facility to convert TimelineEntity objects to respective YARN reports are

Launch a new web service i.e TimelineYARNEntityReaderWebServices along with TimelineReaderWebServices. This service would run as part of TimelienReader daemon. This new service is just an translator to converting TimelineEntity object to corresponding YARN reports such as ApplicationAttemptReport or ContainerReport. And this service would contains new YARN specific entities which are exposed REST end points in this patch. Note that these REST end points return type are corresponding reports like AppAttemptsInfo. This approach can be modified/enhanced to pulling out of ATSv2 to separate daemon service OR can be start new service in RM itself with application history.

For existing TimelineReaderWebServices, change the return type for newly added REST end points such as getAppAttempts/getAppAttempt/getContainers/getContainer with AppAttemptsInfo/AppAttemptInfo/ContainersInfo/ContainerInfo respectively. Cons to add in TimelienReader service is it becomes bit combination of ATS+YARN specific details.

Rohith Sharma K S
added a comment - 12/Oct/16 11:14 Thanks Sangjin Lee Li Lu Varun Saxena for reviewing patch.
On second thought, ultimate goal for YARN-5561 and YARN-5699 is to provide an easy way for YARN Web UI. It would be much helpful for YARN Web UI and also for the users who wants query containers/app-attempts from CLI if app-attempts/containers are in the form of ApplicationAttemptReport and ContainerReport format.
Basically overall intention is to provide flexible REST API to users with which they can directly make use from response output rather than allowing them to parse the ATSv2 REST output.
I think it would be good to take early decision rather than struggling at final stages which leads to many code changes in dependencies especialy YARN Web UI. Couple of approaches for providing a facility to convert TimelineEntity objects to respective YARN reports are
Launch a new web service i.e TimelineYARNEntityReaderWebServices along with TimelineReaderWebServices. This service would run as part of TimelienReader daemon. This new service is just an translator to converting TimelineEntity object to corresponding YARN reports such as ApplicationAttemptReport or ContainerReport. And this service would contains new YARN specific entities which are exposed REST end points in this patch. Note that these REST end points return type are corresponding reports like AppAttemptsInfo. This approach can be modified/enhanced to pulling out of ATSv2 to separate daemon service OR can be start new service in RM itself with application history.
For existing TimelineReaderWebServices, change the return type for newly added REST end points such as getAppAttempts/getAppAttempt/getContainers/getContainer with AppAttemptsInfo/AppAttemptInfo/ContainersInfo/ContainerInfo respectively. Cons to add in TimelienReader service is it becomes bit combination of ATS+YARN specific details.
Thoughts.? This will bring down many of the JIRA like YARN-5561 YARN-5699 etc can be combined and avoided altogether .
cc:/ Vinod Kumar Vavilapalli

Similar to option 1 in terms of having separate endpoints is what we had in AHS/ATSv1 too i.e. in the form of AHSWebServices. That will clearly delink APIs'. What do we name it though ? ws/v2/applicationhistory ?
We can further add things like serving container logs from this endpoint too. And this can act as one stop destination for fetching YARN specific history data.

Could not get the intention behind pulling it out to a separate daemon though. Can you elaborate on that ?

Varun Saxena
added a comment - 12/Oct/16 17:57 Similar to option 1 in terms of having separate endpoints is what we had in AHS/ATSv1 too i.e. in the form of AHSWebServices. That will clearly delink APIs'. What do we name it though ? ws/v2/applicationhistory ?
We can further add things like serving container logs from this endpoint too. And this can act as one stop destination for fetching YARN specific history data.
Could not get the intention behind pulling it out to a separate daemon though. Can you elaborate on that ?

I'm generally OK with option 1, as it's quite consistent with my original thought (see the comment on Aug. 30). The only thing I would like to mention here is, we gave up this idea before because we thought when web UI requesting data, it needs to contact two different endpoints. I'm not sure if with current web UI use cases this problem is alleviated.

This approach can be modified/enhanced to pulling out of ATSv2 to separate daemon service OR can be start new service in RM itself with application history.

I'd incline not to go this far as of now. Sure, architecturally we can keep this flexibility, but could you elaborate more on the concrete use cases for the need of a separate daemon for now?

Li Lu
added a comment - 12/Oct/16 18:41 I'm generally OK with option 1, as it's quite consistent with my original thought (see the comment on Aug. 30). The only thing I would like to mention here is, we gave up this idea before because we thought when web UI requesting data, it needs to contact two different endpoints. I'm not sure if with current web UI use cases this problem is alleviated.
This approach can be modified/enhanced to pulling out of ATSv2 to separate daemon service OR can be start new service in RM itself with application history.
I'd incline not to go this far as of now. Sure, architecturally we can keep this flexibility, but could you elaborate more on the concrete use cases for the need of a separate daemon for now?

I recognize the pain point you're mentioning in terms of wanting the data in the form of ApplicationAttemptReport or ContainerReport. That said, adding a new daemon should not be done lightly. I am in the opinion that the bar above which a new component is added to the system should be high. I'm not sure if this meets that bar.

Also, please note that what's contained in the current REST output would likely to be a superset of *Report; in other words, there are things in the entity API that are not in the *Report. Things like the uid and the currently discussed entity id prefix come to mind. So we cannot simply replace the return type, or things will be crippled.

As we briefly discussed during the call the other day, what we need is a translation layer that can create a Report object out of the timeline entity. If we implement such a translation layer, would it satisfy this? That way, the client gets the full timeline entity information, and it can convert it into a report object fairly easily without writing much code.

Sangjin Lee
added a comment - 12/Oct/16 18:47 I recognize the pain point you're mentioning in terms of wanting the data in the form of ApplicationAttemptReport or ContainerReport . That said, adding a new daemon should not be done lightly. I am in the opinion that the bar above which a new component is added to the system should be high. I'm not sure if this meets that bar.
Also, please note that what's contained in the current REST output would likely to be a superset of *Report ; in other words, there are things in the entity API that are not in the *Report . Things like the uid and the currently discussed entity id prefix come to mind. So we cannot simply replace the return type, or things will be crippled.
As we briefly discussed during the call the other day, what we need is a translation layer that can create a Report object out of the timeline entity. If we implement such a translation layer, would it satisfy this? That way, the client gets the full timeline entity information, and it can convert it into a report object fairly easily without writing much code.

Could not get the intention behind pulling it out to a separate daemon though. Can you elaborate on that ?

could you elaborate more on the concrete use cases for the need of a separate daemon for now?

I meant say, Initial idea is combine YARN entity reader along with TimelineReaderWebService. In case folks do not agree then next proposal is to pulling out as separate daemon. Even I am decline for pulling out from timelinereader service but if other folks have any concern for combining along with timelinereader then it is an option.

As we briefly discussed during the call the other day, what we need is a translation layer that can create a Report object out of the timeline entity. If we implement such a translation layer, would it satisfy this?

Yes. Couple of more doubts just to be in sync, Does translation layer is with-in-the-reader service or separate ? What is the format of report object? Is this report object is general or only yarn entities?

Rohith Sharma K S
added a comment - 13/Oct/16 03:25 Could not get the intention behind pulling it out to a separate daemon though. Can you elaborate on that ?
could you elaborate more on the concrete use cases for the need of a separate daemon for now?
I meant say, Initial idea is combine YARN entity reader along with TimelineReaderWebService. In case folks do not agree then next proposal is to pulling out as separate daemon. Even I am decline for pulling out from timelinereader service but if other folks have any concern for combining along with timelinereader then it is an option.
As we briefly discussed during the call the other day, what we need is a translation layer that can create a Report object out of the timeline entity. If we implement such a translation layer, would it satisfy this?
Yes. Couple of more doubts just to be in sync, Does translation layer is with-in-the-reader service or separate ? What is the format of report object? Is this report object is general or only yarn entities?

Couple of more doubts just to be in sync, Does translation layer is with-in-the-reader service or separate ? What is the format of report object? Is this report object is general or only yarn entities?

I was thinking of something like utility classes that can create and return specific report types. For example,

Sangjin Lee
added a comment - 13/Oct/16 05:25 Couple of more doubts just to be in sync, Does translation layer is with-in-the-reader service or separate ? What is the format of report object? Is this report object is general or only yarn entities?
I was thinking of something like utility classes that can create and return specific report types. For example,
ApplicationAttemptReport getApplicationAttemptReport(TimelineEntity appAttemptEntity);
ContainerReport getContainerReport(TimelineEntity containerEntity);
These classes can be (and probably would need to be) in the yarn common module so any REST client can invoke it to get a translated object back.

Though utility class is very useful for converting reports, one major concern is number of REST calls to be invoked by user. Especially in Web, this will become 2 REST calls which decrease the performance.

Basically, me and Li lu was thinking to embedded TimelineYARNEntityReaderWebService to timelinereader daemon with REST path /ws/v2/applicationhistory. These REST end points would take care of retrieving entities from storage and convert to required YARN reports such as ApplicationAttemptReport or ContainerReport etc.

Also, please note that what's contained in the current REST output would likely to be a superset of *Report;

I agree that current *Report would be subset of TimelineEntity object. So , this we can solve by defining new *Report which compatible with ATSv2 either by extending current *reports since metrics are required. If any new fields published to ATS, then It is contract between publisher and reader that these information fields should added in *Report. New *report class is only wrapper over TimelineRntity object for YARN entities. And at any point of time, if user thinks he need more informations, he can query from ATSv2 which is always open.

Rohith Sharma K S
added a comment - 13/Oct/16 06:12 Thanks for clarifications..
Though utility class is very useful for converting reports, one major concern is number of REST calls to be invoked by user. Especially in Web, this will become 2 REST calls which decrease the performance.
Basically, me and Li lu was thinking to embedded TimelineYARNEntityReaderWebService to timelinereader daemon with REST path /ws/v2/applicationhistory . These REST end points would take care of retrieving entities from storage and convert to required YARN reports such as ApplicationAttemptReport or ContainerReport etc.
Also, please note that what's contained in the current REST output would likely to be a superset of *Report;
I agree that current *Report would be subset of TimelineEntity object. So , this we can solve by defining new *Report which compatible with ATSv2 either by extending current *reports since metrics are required. If any new fields published to ATS, then It is contract between publisher and reader that these information fields should added in *Report. New *report class is only wrapper over TimelineRntity object for YARN entities. And at any point of time, if user thinks he need more informations, he can query from ATSv2 which is always open.

I was also thinking that we can have an endpoint like /ws/v2/applicationhistory.
This can be used to not only serve app attempt.container reports but also serve use cases like serving aggregated logs of historical apps. We would need to serve aggregated logs from somewhere too. This will be useful for UI as well.

Regarding metrics, well we can always extend ContainerInfo to carry metrics as well. Right ?

Varun Saxena
added a comment - 13/Oct/16 16:29 I was also thinking that we can have an endpoint like /ws/v2/applicationhistory.
This can be used to not only serve app attempt.container reports but also serve use cases like serving aggregated logs of historical apps. We would need to serve aggregated logs from somewhere too. This will be useful for UI as well.
Regarding metrics, well we can always extend ContainerInfo to carry metrics as well. Right ?

During weekly sync up, we had discussion regarding for providing new REST URL i.e /ws/v2/applicationhistory. One of the point discussed is what is the advantage is having separate REST end points which returns is *Report format? It will become a another version existing data with different format. But ATSv2 TimelineEntity is super set of all where in user might required information which are not in *Report. So, if required information are published under first class entity info level like YARN-5699 doing, then it is more than sufficient.

The consensus are

we publish the yarn entities important information at first class entity info level so that user can make use of these information to query using infofilters.

We will be going ahead with adding new REST end points(patch already attached i.e YARN-5561.03.patch) which are enhancement of existing getEntities.

Rohith Sharma K S
added a comment - 14/Oct/16 14:53 During weekly sync up, we had discussion regarding for providing new REST URL i.e /ws/v2/applicationhistory . One of the point discussed is what is the advantage is having separate REST end points which returns is *Report format? It will become a another version existing data with different format. But ATSv2 TimelineEntity is super set of all where in user might required information which are not in *Report. So, if required information are published under first class entity info level like YARN-5699 doing, then it is more than sufficient.
The consensus are
we publish the yarn entities important information at first class entity info level so that user can make use of these information to query using infofilters.
We will be going ahead with adding new REST end points(patch already attached i.e YARN-5561 .03.patch) which are enhancement of existing getEntities.

Yeah what I meant is a separate endpoint may be required for things like that. Not sure what we name it though. Or we can just put everything under the hood of ws/v2/timeline. Anyways maybe a separate JIRA should be filed for that disucssion.

Varun Saxena
added a comment - 17/Oct/16 11:25 Yeah what I meant is a separate endpoint may be required for things like that. Not sure what we name it though. Or we can just put everything under the hood of ws/v2/timeline. Anyways maybe a separate JIRA should be filed for that disucssion.

Rohith Sharma K S, should we update documentation for this ?
We can also do it in a separate JIRA though because some of the query params seem to be repeated in the documentation. Probably we can consolidate these query params at a single place in that JIRA and add these endpoints as well. Or you can add the endpoints here itself. As you wish.

Varun Saxena
added a comment - 17/Oct/16 12:02 Rohith Sharma K S , should we update documentation for this ?
We can also do it in a separate JIRA though because some of the query params seem to be repeated in the documentation. Probably we can consolidate these query params at a single place in that JIRA and add these endpoints as well. Or you can add the endpoints here itself. As you wish.

I would prefer to take it up as separate JIRA as consolidated documentation update. And also I suspect and pretty sure, there are couple of more changes will come which require to modify documentations.

Rohith Sharma K S
added a comment - 17/Oct/16 12:21 I would prefer to take it up as separate JIRA as consolidated documentation update. And also I suspect and pretty sure, there are couple of more changes will come which require to modify documentations.

Varun Saxena
added a comment - 17/Oct/16 17:21 That should be fine. We can keep a JIRA open till next drop on trunk and update documentation via it just like we did before first drop on trunk. There can be multiple contributors to the JIRA.

Sangjin Lee
added a comment - 17/Oct/16 17:31 Rohith Sharma K S , could you kindly update v.03 of the patch to address several checkstyle issues (the ones other than the number of arguments violations)? Then we can go ahead with this. Thanks!

Varun Saxena
added a comment - 17/Oct/16 17:49 Few nits.
"Return a set of application-attempts entities" should be "Return a set of application-attempt entities" in javadoc
Return a single application-attempt entity of the given Id => Return a single application-attempt entity for the given attempt Id in javadoc
Return a set of containers entities belongs to given application attempt => Return a set of container entities belonging to given application attempt in javadoc
Above getContainer method javadoc says " Return a single application-attempt entity of the given Id" which is incorrect.

Rohith Sharma K S
added a comment - 18/Oct/16 17:23 Updated patch with addressing Varun's comments and fixed unused import checkstyle issue. The patch given is on trunk and I think it will be applied to YARN-5355 branch also.

A small nit again in javadoc. Sorry for nitpicking.
"Return a set of containers belongs to given application attempt id" should be "Return a set of container entities belonging to given application attempt id"

Varun Saxena
added a comment - 18/Oct/16 18:13 A small nit again in javadoc. Sorry for nitpicking.
"Return a set of containers belongs to given application attempt id" should be "Return a set of container entities belonging to given application attempt id"
Other than that +1 from my side too.