Currently looking at monitoring a switchyard app for the BAM work. I am using some test exchange handlers to generate some test activity events - however had some questions/observations:

1) Currently communications between services defined in a switchyard app are internal to a server. Is there expected to be a distributed version of the infrastructure, so that one service may execute on ServerA and another on ServerB, for load balancing purposes? If so, will there be a way to order the exchange handlers so that they can either be before or after the message is distributed between the servers?

- reason being that if I am interested in monitoring the performance of communications between two services packaged in a single switchyard app, I need to be able to record an event at the point that service 1 'sends' the message, and then another event when the service 2 'receives' the event.

- so this means ideally two separate handlers would be required, one at the beginning of the exchange pipeline (before the message may be distributed to a remote server), and another at the end of the pipeline, just prior to it being dispatched to the target service.

- the other reason for being able to position the handlers is that the logged event may need to include the representation of the message as it is seen by the service, and not after it may have been transformed.

2) Knowledge of whether the exchange begins or ends at a binding

- from a monitoring perspective, a message received by a binding has actually been 'sent' by another external service that may also be monitored.

- however currently there does not seem to be a way within the exchange handler to determine whether this is such a case, and therefore whether to create the 'sent' activity event.

- so this means that potentially the monitoring system may have the 'sent' activity recorded twice

- as the binding is essentially a proxy for the external calling/called service, it would be good if the handler could be aware of this, and therefore not record those activity events

1) Currently communications between services defined in a switchyard app are internal to a server. Is there expected to be a distributed version of the infrastructure, so that one service may execute on ServerA and another on ServerB, for load balancing purposes? If so, will there be a way to order the exchange handlers so that they can either be before or after the message is distributed between the servers?

- reason being that if I am interested in monitoring the performance of communications between two services packaged in a single switchyard app, I need to be able to record an event at the point that service 1 'sends' the message, and then another event when the service 2 'receives' the event.

- so this means ideally two separate handlers would be required, one at the beginning of the exchange pipeline (before the message may be distributed to a remote server), and another at the end of the pipeline, just prior to it being dispatched to the target service.

- the other reason for being able to position the handlers is that the logged event may need to include the representation of the message as it is seen by the service, and not after it may have been transformed.

Distribution to remote nodes will be supported and from a runtime standpoint it will likely look very similar to a service binding. Think of it as an 'internal' binding which is implicit in clustered environments and handles routing between instances in a distributed setup.

Can you give me an idea of what you are trying to measure between 'send' and 'receive'? This is really just measuring the execution time of the handler chain without including the processing time of the service provider. Is that what you want? We could always flip the question on it's head and say "which types of measurements are you trying to provide?". From that, we can back into the right place to put handlers and stuff.

2) Knowledge of whether the exchange begins or ends at a binding

- from a monitoring perspective, a message received by a binding has actually been 'sent' by another external service that may also be monitored.

- however currently there does not seem to be a way within the exchange handler to determine whether this is such a case, and therefore whether to create the 'sent' activity event.

- so this means that potentially the monitoring system may have the 'sent' activity recorded twice

- as the binding is essentially a proxy for the external calling/called service, it would be good if the handler could be aware of this, and therefore not record those activity events

I think some concrete examples of the types of measurements you are trying to take here would be a big help. Personally, I think recording the execution time of external callouts is critical from a monitoring perspective, since a slow external service can have a direct impact on the response time of the service invoking it.

BTW, I think binding metadata will be an interesting area to explore. At the moment, we just map binding-specific details into the context using a context mapper. We have a uniform set of APIs for that purpose, but we don't really have a uniform approach to where context properties go (e.g. naming, grouping), which likely will be necessary for any type of generic monitoring purpose.

Essentially what I want to capture is the point at which a mesage is sent from one service and received from another, to be able to measure latency and have environment information associated with each distinct event. So for a request/response pair, there would effectively be four individual monitoring points. Where the services were within the same switchyard app, and co-located in the same server, this latency may be minimal - but in the case of distributed nodes in a clustered environment, it may provide some useful stats. But it will be most useful where switchyard services are interacting externally.

So when a binding is involved we don't want to create send or receive monitoring events for those points, as the message would have originated from (or be destined for) an external service.

Can we use a concrete example for these events? Maybe we can take a quickstart or a modified version of one of the quickstarts and then talk about the events you would like to see. I'm still a bit confused about why a binding would result in send and receive events being filtered out. I *think* that the use case you're focused on here is when one SwitchYard instance talks to another SwitchYard instance - creating an ability to measure the point of send and the point of receive, where the delta corresponds to the amount of time consumed by transmission. Is that right?

Supplier service is exposed to client apps via various bindings, including SOAP and JMS

In this example, assume that we are monitoring the client app and Logistics service, although potentially these could be in different organisations (i.e. outside our monitoring domain).

So when a client application places an order:

a 'request sent' event would be logged at the client app

a 'request received' event would then be logged as the message was consumed by the Supplier service

the Supplier service then invokes the CreditAgency to perform a credit check, resulting in a 'request sent' event being logged

a 'request received' event would then be logged as the message was consumed by the CreditAgency service

a 'response sent' event would then be logged as the result was returned from the CreditAgency service

a 'response received' event would be logged as the Supplier service consumes the result

a 'request sent' event is recorded when the Supplier service invokes the external Logistics service (via SOAP binding)

if within our monitoring domain, then the Logistics service would log a 'request received' followed by a 'response sent' when the operation completed

a 'response received' event would then be logged at the Supplier service upon receipt of the logistics response

then finally the Supplier service would cause a 'response sent' to be logged when it returned the result to the client app, which would equally log a 'response received'

So essentially, regardless of whether the communications are internal (between two services contained in a switchyard app, but that may be distributed) or between two services that communicate via a binding, it does not matter - what is being recorded is when one service sends a message, and the other service receives it.

Bindings are unimportant as they are a part of the transport enabling communcation between the services. It is the service boundaries that are important.

Thanks, the example helps quite a bit. We can trigger an event for sent and received, but I want to give you an idea of what these actually mean under the hood.

For "sent" events:

Request Sent will be triggered when a message is sent via Exchange.

Response Sent will be triggered only if the MEP is In-Out and it's triggered when the provider's handler returns. This corresponds to the provider implementation completing it's processing - the reply message is then routed to the service consumer.

For "receive" events:

Request Received will be triggered before the provider handler is invoked.

Response Received will be triggered only if the MEP is In-Out and before the consumer's reply handler is invoked.

The thing I'm not clear about is what useful metadata we can provide w/r/t bindings that would allow you to disqualify a given event. Assuming all the services in your example were remote, every service invocation would inolve a binding. Based on the event rules I listed above, there would be a request sent event for the original implementation and the service binding in the remote runtime (assuming it's SwitchYard in both instances).

Going back to your example, can you give an idea of the measurements you would want to take over the events being generated? We already have the execution time for services and references today, so this allows you to measure things like:

What is the response time for Supplier service?

What is the response time for CreditAgency when it's invoked from Supplier service?

Sorry for all the questions. I'm happy to add any events you want, just want to make sure you get what you need.

The thing I'm not clear about is what useful metadata we can provide w/r/t bindings that would allow you to disqualify a given event. Assuming all the services in your example were remote, every service invocation would inolve a binding. Based on the event rules I listed above, there would be a request sent event for the original implementation and the service binding in the remote runtime (assuming it's SwitchYard in both instances).

Would it be possible to provide a source and target component category metadata - e.g. binding or service component? As otherwise the situation you describe would occur and result in duplicate events.

Keith Babo wrote:

Going back to your example, can you give an idea of the measurements you would want to take over the events being generated? We already have the execution time for services and references today, so this allows you to measure things like:

What is the response time for Supplier service?

What is the response time for CreditAgency when it's invoked from Supplier service?

Although the response time for an invocation would be of interest, as well as the latency between a send and receive - end users may want measurements from any event to any other potentially, e.g. the time taken between accepting and order and having it dispatched to a supplier must be within 'x' seconds. So the scope of the metric may not be the same as the scope of a service operation.

Having this level of events is also useful to scope other activity events that may occur during the course of the services being executed.

Keith Babo wrote:

Sorry for all the questions. I'm happy to add any events you want, just want to make sure you get what you need.

Would it be possible to provide a source and target component category metadata - e.g. binding or service component? As otherwise the situation you describe would occur and result in duplicate events.

This is not difficult for receive events. It's tricky for sent events. I think I can hide it behind ServiceReference, but need to think that through.

Although the response time for an invocation would be of interest, as well as the latency between a send and receive - end users may want measurements from any event to any other potentially, e.g. the time taken between accepting and order and having it dispatched to a supplier must be within 'x' seconds. So the scope of the metric may not be the same as the scope of a service operation.

Having this level of events is also useful to scope other activity events that may occur during the course of the services being executed.

OK, I'm with you now. Have you considered what the correlation key will be for these events?

There will be different levels of correlation key - some from the messages being exchanged (to correlate to the business transaction), some from BPM execution to a process executing locally, etc. Some events won't have any correlation information.

So the idea is that activity events will be collected within a group associated with a scope (or transaction if available), so that the correlation information will be relevant across the group of events.