I have an Incident update async CPM which executes through a business rule. The CPM is required to get the latest value of certain Incident fields and send them over to an external system via CURL. One of the field updates that I need to send is associated Contact details.

When an incident field such as Status, Thread etc. is updated and Incident is saved, I can access the latest updated value of these fields and send them to the external system. However, when I update the contact and save the Incident, the Incident record in CPM just cannot get the latest true value of associated contact.

Suppose, a contact A was associated to Incident and I changed it to B. The Incident record in CPM will still give me the value of A. Now, if I change the contact again to C, I will see either A or B. I keep getting random contact ID associated to the Incident, sometimes it's the contact I added a few saves before.

Now, I have tried $incident->PrimaryContact with

the object instance available to CPM in apply method

Fetch() the incident record and then access the contact

Use ROQL to fetch the incident and then access the contact

No matter how I try to get the Contact value, I always get wrong result.

Note: There is no issue when I do NOT run the CPM asynchronously. Since I want to make a call to an external system, I need to run it async.

Did anyone encounter this before? Is it a known issue or a product bug that I am not aware of?

Comment

The nature of an asynchronous process is that its execution is not guaranteed to happen at a particular time. So, you cannot take action on a record and have an expectation as to the record's state when an async CPM runs because it could run a microsecond after the record update or 1 minute later. The record at the DB level may have a lot happen to it in the time that the save is called and the async CPM is fired. I think that the behavior that you describe would be expected based on the nature of the process. Synchronous processes, on the other hand, do happen in a certain order, so there is access to the "prev" variable on objects that hold the state of the object before the change.

You could do something that captures the state change with synchronous CPMs, then call the async CPM through a different vector, like a different save that triggers the integration CPM via business rules, so that you have an operation that is aware of the state of the record and then an integration step that can operate asynchronously after you have the data that you need.

I do understand we cannot know exactly when the process would run and I am also not expecting it to be time-critical. However, it should/must run after the record is updated, right? As you mentioned, it could be microsecond after the update or 1 minute. I am not counting on the time of the execution but the sequence should be correct. I also don't need 'prev' value for this task, so that is not a concern. What I don't understand is getting a value which was updated multiple saves prior to current update. How am I getting a value which is already overwritten a few times? Where is the old value coming from?

Yes, I also worked on a similar workaround you suggested to have another non-async CPM save the updated contact to a custom field and then the async CPM reads that custom field to get the desired value. Not Ideal, and a dirty hack. So I came here looking for a cleaner solution.

Depending on how you are querying the information, and the delay on the replication database, you might see a delay like this. How are you querying for your data in the CPM? If you use a ROQL query and specify the operational database, does that address the delay issue that you see?

(Just be careful to only use operational DB queries where absolutely needed and leave the rest to the reporting db.)

I could be mistaken, but I think you are going to be stuck by using PHP to try to run code asynchronously in a predictable way. The problem is that asynchronous code can run multiple times with different results and I do not think that the PHP binary that comes with Service Cloud will let you do anything fancy to solve your problems (but I could be wrong).

Typically how this solved with Javascript/Node world is using Promises; however, I am not sure how you could use this on Service Cloud.

I would try to use the Service Cloud REST API and write a script that can be run sequentially (not asynchronously) that processes the changes you need in the correct order. You could use the Service Cloud REST API to handle the changes you would like to make to the incident record.

The only difficulty would be on where to host such a script and how it could be accessible externally. I use Heroku for all kinds of hosting and projects, but that is what I am most familiar with, it may not be the best tool for your needs.

Then what you would need to do is write an async CPM that makes a call to the script you have hosted. The CPM would only be responsible for pushing data; your integration script would be responsible for making the changes across the different services (including Service Cloud).

The major tradeoff with this approach is that you would have to rely on making sure however your integration script is hosted that it can be connected; however, it is pretty simple to use reliability services to ensure the health of the server where you integration script is hosted.

Also, If you have a ton of incidents being updated, you might need to scale such an integration that you can defer requests or create a strategy around how to manage the pipeline of requests. This would be more up to your discretion.

This is a very interesting issue. I would not know to solution but find it interesting to understand what is going on.

On one hand you might indeed think that when the cpm is executed that you want it to work with the current status.

However, on the other hand I think that if you trigger a cpm that you want to act on the data as it was on that moment. Meaning that if the contact on the incident was contact A and then the cpm was triggered, it may be logical to have it work with contact A. If you trigger the cpm in aync mode so that it is executed later, and meanwhile another contact is set, would it be fair to use contact B? That would depend on the usecase. In general you want it to work with the latest data of course, but still, perhaps for async some things are being cached to work with the actual data of the moment it was triggered?

Thanks Scott, Rajan and Barrilito for your inputs. This is becoming an interesting discussion.

Scott, I had already tried the ROQL method which did not work. The delays were still there. The workaround mentioned earlier is working fine till now, haven't looked for another solution yet. I also suspected (and still suspect to some extent) the operational vs reporting database delays for this behavior.

While searching for more ideas, I stumbled upon this article on CXDeveloper which describes the same issue I am having. The problem lies with the utility server's PHP process runtime and it's unpredictability.

Rajan, thanks for the innovative workarounds. I haven't tried it yet but would want to try REST and see how it goes. Please correct me if I am wrong, I think the hosted script solution is similar to the library-code management approach mentioned in above article, which also did not work. I would still want to give these options a try when I get some extra time.

Barrilito, I agree that theoretically the CPM should work with the current status. However, it goes and picks up random values. During my testing, the CPMs seemed to execute instantaneously and I did not make any edits until I could see that CPM has finished executing. Your statement about data being cached is right, as Ben also mentioned that in the article above. The cached data is being accessed to pick up random values and giving wrong results.

I understand the issue a lot more now, thanks to all the detailed comments. I am surprised that an issue like this did not surface before. In my opinion, the real-time integration between OSC and external system would be a common use-case for CPM. So, I would expect this type of problem to be more frequent. I guess it's not that common, then.

interesting problem you're running into. The article on CX Dev seems to be more about different cached versions of your CPM code than different versions of the actual data it uses, which is not described there at all. I also do not think that data is cached specifically for your CPM to run, since it does not know what data you need. So since the async CPM's are run on separate utility servers and you get older results, the only explanation I can see is that these utility servers have a replication database which they use to execute the CPM. I am not completely sure about the server setup, but the replication database might actually be the same server as the utility server. This might also explain that Roql queries specifying Operational Database does not work since you're code is actually running on a separate server.

You might double check this by having the Async CPM outputting the date last updated and see if it has the last update you did on the incident and see if it is running behind?

The database is not run on the utility servers. The likely problem in this case is the implementation of the CPM and the site's configuration, which is a missing variable in the discussion here, but is almost always the cause of issues/reports like this.

If the CPM is querying for data using the replication database, then there could be a delay in the data update as there is no set timing in the replication; it can be seconds, minutes, and sometimes hours. Hitting the replication DB could result in this type of behavior if the replication delay is not addressed in code. Those really long delays are pretty rare.

If certain that you're hitting the operational database, then the async CPM will get the data from the operational DB at the state that exists at the time of the query. What's missing is the operation of the CPM, other CPMs, business rules, etc. It's highly likely that data is being transformed in business rules or elsewhere in the CPM, which then re-triggers the CPM by not implementing suppression on save calls. Could easily give the impression that the CPM is working as expected, but the data is invalid, old, or different than expected.

I would suggest that you start with a review of your CPM code. If there are any saves, then ensure that they either call suppress and don't result in a loop where the CPM and/or a business rule could set the incident back to a previous state that makes it appear like cached data. If you do have business rules in play, then check the business rule log to see if your CPM is responsible for triggering a rule. Once you can rule all of that out, then it would be time to submit a support ticket. However, the issue is almost certainly nested in the implementation based on the information that is provided in the thread.

Do you have any other (long running) CPM's triggered by a contact change? There might be the possibility that the actual Async CPM is executed before the Sync CPM's / Business rules are finished. You might make sure that the Async one is called last in your chain of rules and disable any independent CPM's as a test.

The database is not run on the utility servers. The likely problem in this case is the implementation of the CPM and the site's configuration, which is a missing variable in the discussion here, but is almost always the cause of issues/reports like this.

If the CPM is querying for data using the replication database, then there could be a delay in the data update as there is no set timing in the replication; it can be seconds, minutes, and sometimes hours. Hitting the replication DB could result in this type of behavior if the replication delay is not addressed in code. Those really long delays are pretty rare.

If certain that you're hitting the operational database, then the async CPM will get the data from the operational DB at the state that exists at the time of the query. What's missing is the operation of the CPM, other CPMs, business rules, etc. It's highly likely that data is being transformed in business rules or elsewhere in the CPM, which then re-triggers the CPM by not implementing suppression on save calls. Could easily give the impression that the CPM is working as expected, but the data is invalid, old, or different than expected.

I would suggest that you start with a review of your CPM code. If there are any saves, then ensure that they either call suppress and don't result in a loop where the CPM and/or a business rule could set the incident back to a previous state that makes it appear like cached data. If you do have business rules in play, then check the business rule log to see if your CPM is responsible for triggering a rule. Once you can rule all of that out, then it would be time to submit a support ticket. However, the issue is almost certainly nested in the implementation based on the information that is provided in the thread.

My understanding of asynchronous code is that if there is an update made to some object, the code isn't going to wait until that object is updated before it moves on to another part of the script (which is the case in synchronous programming).

The issue would then be that there are race conditions in the script where some values are not resolved before the final update to the contact object. It is probably the case that certain parts of the script are "running out of order"; the cURL request resolves but all of the updates to the contact object are made in advance because the cURL request was done on a forked process, the final update and the cURL request resolve before the first update to the contact, references are bad when the critical part of the script is run, etc.

Strategies from a multi-threaded paradigm (mutex or semaphore) as well as strategies from a single-threaded paradigm (promises/futures) won't work here as the PHP Binary is pretty locked down.

Am I missing something here? I think there is a possibility that it could be a database issue, but I feel like the fact that this is asynchronous code jumps out more to me.

Anuj,

The pattern I am recommending is called an EventEmitter pattern in Node.js programming. The major difference is that instead of passing information to separate process within the operating system, you have your Service Cloud instance passing information to another server where your scripting logic occurs.

The idea is that your CPM is simply responsible for firing off a function based on an event; it is not responsible for the updating logic, it is a message passer.

The integration script would be responsible for all of the updating logic; this would be hosted on separate infrastructure using the language of your choice and the REST API. This is where you send a request to the external application and make your updates to Oracle Service Cloud.

The integration script would then send a response back on whether it is successful or not to the CPM. The CPM can log any issues or successes accordingly.

Hopefully I am understanding what the issue is; please let me know if this clarifies things.

@rajan While good thoughts, I think you're confusing a couple of things, including development languages, asynchronous development paradigms, and event handlers in OSvC. Basically, the PHP script runs a a certain time in the future from the event that triggers it. That time is based on a scheduler process. The PHP script, when run, can get or set data in OSvC, trigger business rules, or send data to another system. The question here is that when the asynchronous script queries for data against the database, the returned data is not in a state that is expected, and therefor isn't safe to integrate with the other system because it's not in an expected state. That's much more likely an issue due to the CPM doing something strange or the way this instance of OSvC is managing data through business rules, etc. that delivers info to the script that is not expected.

The question here is that when the asynchronous script queries for data against the database, the returned data is not in a state that is expected, and therefor isn't safe to integrate with the other system because it's not in an expected state. That's much more likely an issue due to the CPM doing something strange or the way this instance of OSvC is managing data through business rules, etc. that delivers info to the script that is not expected.

Now that I think about it, I think possibly the only thing Anuj may need to do is add a usleep(1000) function before making an update to the incident after the cURL request.

I have some custom model code that I use to update an incident on our customer portal. If the incident is successful, it returns incident data that I pull from another custom method as to update the UI with the changes:

Your comment about the database state made me think about what I had to do in this separate function; I had to introduce a usleep statement prior to fetching the incident data as to account for the changes that were made in the update method: