Transcription

1 An index to the state of the art and an outline of open research challenges at DIIAG Claudio Di Ciccio, Massimo Mecella Seminars in Software and Services for the Information Society Rome, 2012, May the 7 th

2 Definition [Aalst2011.book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs). ProM [AalstEtAl2009] is one of the most used plugin based software environment for implementing workflow mining (and more) techniques. The new version 6.0 is available for download at P. 2

4 Further reading The rest of the lesson is based on the following material: Van der Aalst, W. M. P.: Process Discovery: An Introduction Available at process_mining_chapter_05_process_discovery.pdf From the teaching material for [Aalst2011.book] De Medeiros, A. K. A.: : Control-Flow Mining Algorithms Available at lecture3_controlflowminingalgorithms.ppt P. 4

5 A different context (1) Artful processes and knowledge workers Artful processes [HillEtAl06] informal processes typically carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc.) knowledge workers [ACTIVE09] Knowledge workers create artful processes on the fly Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared. P. 5

6 A different context (2) conversations In collaborative contexts, knowledge workers share their information and outcomes with other knowledge workers E.g., a software development mgr. Typically, by means of several conversations conversations are actual traces of running processes that knowledge workers adhere to P. 6

7 A different context (3) Processes from conversations From the collection of messages, you can extract the processes that lay behind Related conversations are traces of their runs Valuable advantages for users Automated discovery of formal representations with no effort for knowledge workers Tidy organization for naïve best practices kept only in mind Opportunity to share and compare the knowledge on methodologies Automated discovery of bottlenecks, delays, structural defects from the analysis of previous runs conversations are a kind of semi-structured text this approach is not tailored to the electronic mail it can be extended to the analysis of other semi-structured texts P. 7

8 A different context (4) Some areas of applicability Personal information management (PIM) how to organize one s own activities, contacts, etc. through the usage of software Information warfare in supporting anti-crime intelligence agencies Enterprise engineering for knowledge-heavy industries, where preserving documents making up product data is not enough ehealth for the automatic discovery of medical treatment procedures on top of patient health records P. 8

9 MailOfMine What is MailOfMine? MailOfMine is the approach and the implementation of a collection of techniques, the aim of which is to is to automatically build, on top of a collection of messages, a set of workflow models that represent the artful processes laying behind the knowledge workers activities. [DiCiccioEtAl11] [DiCiccioMecella12] [DiCiccioMecella/TR12] P. 9

10 On the visualization of processes The imperative model Represents the whole process at once The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets) P. 10

11 On the visualization of processes The declarative model If A is performed, B must be perfomed, no matter before or afterwards (responded existence) Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints the idea is that every task can be performed, except the ones which do not respect such constraints this technique fits with processes that are highly flexible and subject to changes, such as artful processes The notation here is based on [AalstEtAl06, MaggiEtAl11] (DecSerFlow, Declare) Whenever B is performed, C must be performed afterwards and B can not be repeated until C is done (alternate response) P. 11

12 On the visualization of processes Imperative vs declarative Declarative Imperative Declarative models work better in presence of a partial specification of the process scheme P. 12

14 Declare constraint templates Relation templates RespondedExistence(A, B) If A occurs in the process instance, then B occurs as well CAC CAACB BCAC BCC Response(A, B) If A occurs in the process instance, then B occurs after A BCAAC CAACB CAC BCC AlternateResponse(A, B) Each time A occurs in the process instance, then B occurs afterwards, before A recurs BCAAC CAACB CACB CABCA BCC CACBBAB ChainResponse(A, B) Each time A occurs in the process instance, then B occurs immediately afterwards BCAAC BCAABC BCABABC P. 14

15 Declare constraint templates Relation templates RespondedExistence(B, A) If B occurs in the process instance, then A occurs as well CAC CAACB BCAC BCC Precedence(A, B) B occurs in the process instance only if preceded by A BCAAC CAACB CAC BCC AlternatePrecedence(A, B) Each time B occurs in the process instance, it is preceded by A and no other B can recur in between BCAAC CAACB CACB CABCA BCC CACBAB ChainPrecedence(A, B) Each time B occurs in the process instance, then B occurs immediately beforehand BCAAC BCAABC CABABCA P. 15

16 Declare constraint templates Relation templates CoExistence(A, B) If B occurs in the process instance, then A occurs, and viceversa CAC CAACB BCAC BCC Succession(A, B) A occurs if and only if it is followed by B in the process instance BCAAC CAACB CAC BCC AlternateSuccession(A, B) A and B occur in the process instance if and only if the latter follows the former, and they alternate each other in the trace BCAAC CAACB CACB CABCA BCC CACBAB ChainSuccession(A, B) A and B occur in the process instance if and only if the latter immediately follows the former BCAAC BCAABC CABABC P. 16

17 Declare constraint templates Negative relation templates NotCoExistence(A, B) A and B never occur together in the process instance CAC CAACB BCAC BCC NotSuccession(A, B) A can never occur before B in the process instance BCAAC CAACB CAC BCC NotChainSuccession(A, B) A and B occur in the process instance if and only if the latter does not immediately follows the former BCAAC BCAABC CBACBA P. 17

18 Relation constraint templates subsumption Constraint templates are not independent of each other P. 18

20 MINERful The declarative workflow mining algorithm of MailOfMine Key idea: building a knowledge base with local and global statistics on the mutual order of appearance of events for further fast querying Performances: the algorithm is proven to be fast (over 12m events processed in less than 170 secs.) Asymptotically: linear in the number of the traces quadratic in the number of events per trace i.e., polynomial in the input size linear in the number of constraint templates See [DiCiccioMecella/TR12] for further reading P. 20

22 On the representation of artful process schemata Regular grammars expressing declarative workflows In MailOfMine, each constraint in the set which can be used to define an artful mined process is expressible through regular grammars, where: activities are terminal characters, building blocks of constraints on tasks; constraints are regular expressions, equivalent to regular grammars; the process scheme is the intersection of constraints defined on top of activities. The process scheme defines a Process Describing Grammar (PDG) P. 22

23 On the usage of regular grammars The rationale: why not LTL for declarative workflows? Temporal logic is a formalism for describing sequences of transitions between states in a reactive system Linear Temporal Logic (LTL, [Pnueli77]) describes events along a single computation path LTL formulæ are verified over semi-infinite runs defined over Kripke structures They are good for automatically checking the correct work of circuits or server programs Not for human processes which have both a starting point and an end In the long run, we are all dead ' (John Maynard Keynes) Regular grammars are verified by Finite State Automata working with less complex algorithms, in terms of computational effort A PDG describes the language spoken by collaborative organisms in terms of activities P. 23

25 On the visualization of processes An example of DecSerFlow [VanDerAalstEtAl06] notation You could even start from here No, it is not the initial action You might want to run a legal trace like this: a3, a3, a3, a2, a2, a3, a4, a5, a6, a7, a6, a5 What we want to state here is that such a notation is probably not quite intuitive P. 25

26 On the visualization of processes Our proposal We do not consider a static graph-based global representation alone the best suitable solution. A graphical representation, easy to understand at a first glimpse, must be used. Idea: when presenting the process schema (static view): 1) a local view on tasks/activities, showing related constraints only; 2) a global view on the process, either: a) basic (less information, less symbols), or b) extended (more information, more symbols, extending (a)); (2) can work as a kind of navigation map for (1) when presenting the running instance (dynamic view): a dynamic interactive trace representation diagram, based on the local static view notation. See [DiCiccioEtAl2011] for further reading P. 26

27 On the visualization of processes Introducing the new local view: the rationale P. 27

28 On the visualization of constraints The static local view: some examples P. 28

An index to the state of the art and an outline of open research challenges at DIAG Claudio Di Ciccio, Massimo Mecella Seminars in Software and Services for the Information Society Definition [Aalst2011.book],

CHAPTER 1 INTRODUCTION 1.1 Research Motivation In today s modern digital environment with or without our notice we are leaving our digital footprints in various data repositories through our daily activities,

Generation of a Set of Event Logs with Noise Ivan Shugurov International Laboratory of Process-Aware Information Systems National Research University Higher School of Economics 33 Kirpichnaya Str., Moscow,

On the Modeling and Verification of Security-Aware and Process-Aware Information Systems 29 August 2011 What are workflows to us? Plans or schedules that map users or resources to tasks Such mappings may

PLG: a Framework for the Generation of Business Process Models and their Execution Logs Andrea Burattin and Alessandro Sperduti Department of Pure and Applied Mathematics University of Padua, Italy {burattin,sperduti}@math.unipd.it

Compliance Analysis in IT Service Management Systems Master Thesis IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE (M.Sc.) IN INFORMATION SYSTEMS AT THE SCHOOL OF BUSINESS

Process Mining and Fraud Detection A case study on the theoretical and practical value of using process mining for the detection of fraudulent behavior in the procurement process Masters of Science Thesis

Open Research Online The Open University s repository of research publications and other research outputs Semantic process mining tools: core building blocks Conference Item How to cite: de Medeiros, Ana

Business Process Improvement Framework and Representational Support Azeem Lodhi, Veit Köppen, and Gunter Saake Department of Technical and Business Information Systems, Faculty of Computer Science, University

The Roman Model for Automated Synthesis in Practice: the SM4All Experience An implementation of the game structure based automated syntesis of services applied to a real scenario Mario Caruso 1 Claudio

Process Mining. Data science in action Julia Rudnitckaia Brno, University of Technology, Faculty of Information Technology, irudnickaia@fit.vutbr.cz 1 Abstract. At last decades people have to accumulate

Abstract Business Process Mining: From Theory to Practice C.J. Turner, A. Tiwari, R. A. Olaiya and Y, Xu Purpose - This paper presents a comparison of a number of business process mining tools currently