Pinned topictracking SA AM events via msgEngine.log

‏2012-03-21T12:34:43Z
|Tags:

Answered question
This question has been answered.

Unanswered question
This question has not been answered yet.

Hi,

We're looking to analyse the SA AM msgEngine.log file to build up a list of recent events that have occurred, and we can see various resource 'Start' and 'Stop' commands being initiated, but we can't see where they have ended.

However, after we issue a 'Restart' command, we get a 'Restart of resource group ... is completed successfully after ... seconds' message. but we see no indication of a stop or start command being completed.

Does anyone know why this is and where this information might be ontained from?

Re: tracking SA AM events via msgEngine.log

1. Online/Offline Request Messages in E2E Domain log do not have an "completed" message
First I want to summarize what the log of the E2E domain basically shows:

Actions an operator did, interacting with the E2E Automation domain and its resources

Problems occured within the automation engine e.g. by a wrong configuration which need operator attention

Problems with Resources which can not be recovered by the automation engine and need operator attention

Information about joining/leaving FLA domains respectively changes in the communication state to some FLA domain

The log does not provide information about

State changes of single resources in the e2e policy

Therefore you see the request of an operator which has been issued against a resource reference.

When a "request to start a resource reference" is placed then it is being accepted by the automation engine and put on the request stack of this resource. This can cause a change in the desired state of this resource and possibly also changes the desired state of many other dependent other resources - caused by a startAfter relationship. The result of this action can be the cause that requests are immediately created by the end-to-end automation engine and send to an FLA domain - but this can also happen at a much later time or even never (because the relationship is never fullfilled - think of a startAfter to another reference which points to a FLA domain which never becomes online).
When the request is actually sent to the referenced FLA automation domain this game starts again (if its a request driven domain, such as SAMP or SA z/OS). The request now sent by the automation engine on behalf of the original user request is being "accepted" by the FLA domain and might cause the desired state of this and possibly many other resources to be changed. At some time the request might be the "winning" one and will be fullfilled.
So, within these last sentences I tried to summarize in "few" words the concept of "request driven automation". Unfortunately it is not so easy to have a direct correlation of "action (start)" to "reaction (state indicates its started)" - and its not always predictable how long it will take.
The automation engine does not remember, when a request has been placed on a resource and thus cannot calculate the time until something really has been started (which certainly could be logged.

2. Restart request does state a "successfull" message at the end
Well, beside of the fact that this restart request is quite new...
...it also has been designed and implemented in a different way and cannot be explained with the request driven approach above.
The restart request as it has been implemented by the SA AppMan is more a "convenience" method which simulates the actions of an operator
a) Issue Offline request
b) Wait until resource becomes offline
c) Cancel Offline request
It is more a very small workflow - which even has a customizable timeout. Here we were able to measure the time and to log it in the message you are referring to. We thought that operators would use the restart more likely on FLA resources or on ResourceReferences which do not have a big tree of relationships which inhibits the immediate stop/start requests from being executed as being described above.

3. How can you track the list of recent events?
Here we have quite a few options.
a) You can enable report data collection for SAMP, SA zOS and SAAM resources and thus find the information in the "history" tab of the operations console.
b) In order to see a very detailed overview of all resource state changes including answers to the question "How long did it take that a resource started" you can make use of our build in report generation feature with TCR.
c) In the E2E command shell you can get an overview of all actions and events received-by and sent-from the automation engine by using the command "eezdiag -t hist"
d) You can get the same history by doing a "grep 'Event History:'" on the file traceEngine.log

I hope this information helped a bit.
(And I hope even more you already find the infos you want by using our provided methods....)

Re: tracking SA AM events via msgEngine.log

1. Online/Offline Request Messages in E2E Domain log do not have an "completed" message
First I want to summarize what the log of the E2E domain basically shows:

Actions an operator did, interacting with the E2E Automation domain and its resources

Problems occured within the automation engine e.g. by a wrong configuration which need operator attention

Problems with Resources which can not be recovered by the automation engine and need operator attention

Information about joining/leaving FLA domains respectively changes in the communication state to some FLA domain

The log does not provide information about

State changes of single resources in the e2e policy

Therefore you see the request of an operator which has been issued against a resource reference.

When a "request to start a resource reference" is placed then it is being accepted by the automation engine and put on the request stack of this resource. This can cause a change in the desired state of this resource and possibly also changes the desired state of many other dependent other resources - caused by a startAfter relationship. The result of this action can be the cause that requests are immediately created by the end-to-end automation engine and send to an FLA domain - but this can also happen at a much later time or even never (because the relationship is never fullfilled - think of a startAfter to another reference which points to a FLA domain which never becomes online).
When the request is actually sent to the referenced FLA automation domain this game starts again (if its a request driven domain, such as SAMP or SA z/OS). The request now sent by the automation engine on behalf of the original user request is being "accepted" by the FLA domain and might cause the desired state of this and possibly many other resources to be changed. At some time the request might be the "winning" one and will be fullfilled.
So, within these last sentences I tried to summarize in "few" words the concept of "request driven automation". Unfortunately it is not so easy to have a direct correlation of "action (start)" to "reaction (state indicates its started)" - and its not always predictable how long it will take.
The automation engine does not remember, when a request has been placed on a resource and thus cannot calculate the time until something really has been started (which certainly could be logged.

2. Restart request does state a "successfull" message at the end
Well, beside of the fact that this restart request is quite new...
...it also has been designed and implemented in a different way and cannot be explained with the request driven approach above.
The restart request as it has been implemented by the SA AppMan is more a "convenience" method which simulates the actions of an operator
a) Issue Offline request
b) Wait until resource becomes offline
c) Cancel Offline request
It is more a very small workflow - which even has a customizable timeout. Here we were able to measure the time and to log it in the message you are referring to. We thought that operators would use the restart more likely on FLA resources or on ResourceReferences which do not have a big tree of relationships which inhibits the immediate stop/start requests from being executed as being described above.

3. How can you track the list of recent events?
Here we have quite a few options.
a) You can enable report data collection for SAMP, SA zOS and SAAM resources and thus find the information in the "history" tab of the operations console.
b) In order to see a very detailed overview of all resource state changes including answers to the question "How long did it take that a resource started" you can make use of our build in report generation feature with TCR.
c) In the E2E command shell you can get an overview of all actions and events received-by and sent-from the automation engine by using the command "eezdiag -t hist"
d) You can get the same history by doing a "grep 'Event History:'" on the file traceEngine.log

I hope this information helped a bit.
(And I hope even more you already find the infos you want by using our provided methods....)

Ok - so 'restart' requests are handled bzy a separate process and not the main automation engine, which is why this process can do it's own monitoring of the restart and write it's own messages to the msgEngine.log?

Re: tracking SA AM events via msgEngine.log

Ok - so 'restart' requests are handled bzy a separate process and not the main automation engine, which is why this process can do it's own monitoring of the restart and write it's own messages to the msgEngine.log?