Here is a challenge I have run into repeatedly. I encountered it again this week, so I figured it was time to write about it. What I call Explicit State and Process Knowledge is a clear record indicating the state of a set of data as well as the process that last operated on it or moved it into the current state. This applies to data models, data exchanges, and other parts of integration. Many data models do not address this type of data recording. Data audit models are not usually designed for this purpose and when designed for technical audits are insufficient. For instance an audit, may record the program that changed the data, who changed the data, and when it was changed. However, there could be many processes or rules within a program. Which one caused the state change of the data? Additionally, audits indicate data column(s) or field(s) changed but beyond current record indicators there is usually no state other explicit state model captured with respect to the business processes. The state change history is not often recorded. The type of explicit information regarding the state and process recording is simply not available. Applications and integrations often require interpreting the data with complex logic to determine the context in order to properly process it. This seems funny, since the information should be clear, but I bet the percent of integrations addressing this problem is very large. I encountered this recently. A third party application sends the results to a legacy system as part of a distributed process which control/impact downstream processing. The first application is a specialized Third Party Application (purchased package). The standard data extract from it is a straight table dump of ALL the records for that transaction over time (yes over time). Since the table dump is over time, the data set grows and always includes everything (including previously sent information). A table dump is produced if any process change occurs. However, there is no record or indicator of what business process triggered the table dump and what the relevant state of the data is. The extract is simply an ever growing set of data. Obviously, there was a change in some state because a particular business process executed. However, that is not recorded in the application so there is no record of it in the data extract. By the way, I am not kidding. From an integration and distributed business process standpoint this table dump is initially pure noise. You cannot tell why this happening and what is relevant in the data. The first step requires giving the transaction and data a context to know what happened and why (ironic since at the time the data is changed the knowledge of what is happening and why is completely known). This is accomplished by comparing the current extract to the previous one. The rules for analyzing the difference in the data values to determine what and why are very complex. From a business process standpoint this does nothing beneficial for the business. You get the picture. I see this same problem in applications including ones with new data models. This week is I was working on implementation of a new process. Two steps of it were automated and two manual. The state chart clearly showed the states, transitions, guard conditions, etc. The application developers wanted to capture only one of the automated states and transition conditions. Yet, there are business processes in place that record the information from each manual process, but they were not originally intended to be captured. Adding a capability to record the manual process state change is trivial. Explaining the importance of this didn't really sink in even with requirements, which required recovery from any sub-process failure. The development team wanted (was willing) to disperse this information, making recovery more difficult due to lack of centralized state history and process activity records. Why? Certainly, it is fun programming convoluted logic to determine what is going on in the data. This kind of activity requires some very creative, logical mental gymnastics. However from a business viewpoint, determining data state and process activity is wasted effort. The funny thing is executing applications (and processes) have explicit knowledge of the change in data state and why as they run. Simply recording it and relating it to the entity and/or transaction as the application executes is straightforward and simple. It has many benefits including: 1. Business Analytics/Reporting: much easier to select sets of records based on state and process knowledge with fewer errors based on conditional logic - improves the accuracy of analysis (less variance in the accuracy of process assessment) 2. Integration: knowing the state of a transaction and what process is responsible for the state improves the granularity of these efforts and saves a significant amount of complex and often highly brittle software (any change in the business rules implemented on the source side requires changes in the integration logic) 3. Applications: With this information the selection of data for processes becomes trivial. Other application processes require substantially less logic 4. Event model (Publish-Subscribe): This type of architectural pattern is much easier to implement in this type of environment/situation. Lack of explicit state and process knowledge often prevents implementation of this type of pattern (though it isn't often clearly attributed to this cause) The benefits are so clear to me, yet the amount of times this information is captured and available is so few. I find this true across the board: custom and third party applications, and throughout the integration space. I estimate the amount of logic to do this kind of work exceeds 30%, yet with an explicit data model or data transfer this drops below 5%. In addition, there would be significantly fewer defects. Think about it. If you are building a new data model or data extract add a state record and process history to the entities/extract. All downstream processing will thank you for that effort. If this is in the application data model there will be many additional benefits. Investing some design time up front on this issue is well worth it. Sometimes the trenches are more like a maze than a straight path. I really appreciate, knowing where I came from, where I am, and the seeing where I am going. Have fun.